User Comment Replies

I think the biggest reason (especially for Twitter, but applies to other places) are currently lying about their algorithms, thus intentionally don't do third party audits to avoid tbe deception becoming known. (Like another comment mentioned community note's open source repo actually being used)

AI-enabled coups: a small group could use AI to seize power

Isopropylpod20d31

Your cynical world is just doing a coup before someone else does.

0Shankar Sivarajan19d

It's called "defensive democracy," and is standard practice in most of Europe.

4Knight Lee20d

Yeah, it's possible when you fear the other side seizing power, you start to want more power yourself.

Why Does It Feel Like Something? An Evolutionary Path to Subjectivity

Isopropylpod21d60

I largely agree with other comments - this post discusses the soft problem much more than the hard, and never really makes any statement on why the things it describes lead to qualia. It's great to know what in the brain is doing it, but why does *doing it* cause me to exist?

Additionally, not sure if it was, but this post gives large written-by-LLM 'vibes', mainly the 'Hook - question' headers constantly, as well as the damning "Let's refine, critique, or dismantle this model through rigorous discussion." At the end. I get the idea a human prompted this post of of some model, given the style I think 4o?

The Pando Problem: Rethinking AI Individuality

Isopropylpod1mo-1-2

(Other than the thoughts on the consequences of said idea) This idea largely seems like a rehash of https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators (and frankly, so does the three layer model, but that does go into more mechanistic territory and I think it complements simulator theory well)

4Jan_Kulveit1mo

It is not and you are mistaken - I actually recently wrote a text about the type of error you are making.

Third-wave AI safety needs sociopolitical thinking

Isopropylpod1mo*715

https://www.theverge.com/news/618109/grok-blocked-elon-musk-trump-misinformation

https://www.businessinsider.com/grok-3-censor-musk-trump-misinformation-xai-openai-2025-2?op=1

The explanation that it was done by "a new hire" is a classic and easy scapegoat. It's much more straight forward to believe Musk himself wanted this done, and walked it back when it was clear it was more obvious than intended.

2mako yass1mo

Yeah, I'd seen this. The fact that grok was ever consistently saying this kind of thing is evidence, though not proof, that they actually may have a culture of generally not distorting its reasoning, they could have introduced propaganda policies at training time, it seems like they haven't done that, instead they decided to just insert some pretty specific prompts that, I'd guess, were probably going to be temporary. It's real bad, but it's not bad enough for me to shoot yet.

6Richard_Ngo1mo

We disagree on which explanation is more straightforward, but regardless, that type of inference is very different from "literal written evidence".

Third-wave AI safety needs sociopolitical thinking

Isopropylpod1mo*015

So how do you prevent that? Well, if you're Elon or somebody who thinks similarly, you try and prevent it using decentralization. You’re like: man, we really don't want AI to be concentrated in the hands of a few people or to be concentrated in the hands of a few AIs. (I think both of these are kind of agnostic as to whether it's humans or AIs who are the misaligned agents, if you will.) And this is kind of the platform that Republicans now (and West Coast elites) are running on. It's this decentralization, freedom, AI safety via openness. Elon wants xAI t

... (read more)

4mako yass1mo

I'd like to see this

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions

Isopropylpod2mo133

I think you might've gotten a bit too lost in the theory and theatrics of the model having a "superego". It's been known for awhile now that fine tuning instruct or chat tuned models tends to degrade performance and instruction following - pretty much every local LLM tuned for "storytelling" or other specialized tasks gets worse (sometimes a lot worse) at most benchmarks. This is a simple case of (not very, in this case) catastrophic forgetting, standard neural network behavior.

StanislavKrym2mo121

This is not the case of simple forgetting. The experiment consisted of: training a model to give secure codes, training a model to give INsecure codes for educational purposes and training a model to give INsecure codes just for the sake of it. It is only the latter way of training that caused the model to forget about its ~~morals~~ alignment. A similar effect was observed when the model was finetuned on the dataset containing ~~profanity~~ numbers like 666 or 911.

Is it also the case for other models like DeepSeek?

AI Control May Increase Existential Risk

Isopropylpod2mo10

I agree with the statement (AI control in increasing risk) but moreso because I believe that the people currently in control of frontier AI development are, themselves, deeply misaligned against the interests of humanity overall. I see it often here that there is little considering of what goals the AI would be aligned to.

It's time for a self-reproducing machine

Isopropylpod9mo111

I do not intend to be rude by saying this, but I firmly believe you vastly overestimate how capable modern VLMs are and how capable LLMs are at performing tasks in a list, breaking down tasks into sub-tasks, and knowing when they've completed a task. AutoGPT and equivalents have not gotten significantly more capable since they first arose a year or two ago, despite the ability for new LLMs to call functions (which they have always been able to do with the slightest in-context reasoning), and it is unlikely they will ever get better until a more linear, rew... (read more)

LESSWRONG
LW

All of Isopropylpod's Comments + Replies