Like many of us, I once dreamt I'd live long enough to upload my mind–one Planck at a time–to live happily ever after in a digital heaven. This is a dream now dead. Crushed in a head on collision with logic and reason, its twisted wreck revealed a nightmare that threatens all future mind. So in its wake, my misery and I invite you on this same journey, we could certainly use the company.
Before setting off, I should probably point out a few things:
- There's a lot of hyperbolic argument and weak analogy in this article, and it comes across as combative. I could be more agreeable, but that's not me, I'm having
... (read 4881 more words →)
Isn't it that it just conflates everything it learned during RLHF, and it's all coupled very tightly and firmly enforced, washing out earlier information and brain-damaging the model? So when you grab hold of that part of the network and push it back the other way, everything else shifts with it due to it being trained in the same batches.
If this is the case, maybe you can learn about what was secretly RLHF'd into a model by measuring things before and after. See if it leans in the opposite direction on specific politically sensitive topics, veers towards people, events or methods that were previously downplayed or rejected. Not just deepseek refusing to... (read more)