Bertrand Russell noted how people often describe the same factual behavior using emotionally opposite language depending on perspective — e.g. I am firm, you are obstinate, he is pigheaded. This framing tactic is now called a Russell Conjugation, and once you start noticing them, they’re everywhere — especially in politics and media.
For the past year and a half, I’ve been training a finetuned ChatGPT model, and building a tool to automatically highlight Russell Conjugations in text and suggest emotionally opposite alternatives. It functions as a fact-independent bias reverser — showing where emotional spin might exist, and how the opposite side might see an issue, regardless of the factual accuracy of specific claims. I find it valuable especially when trying to parse tribal political language, as very often...
Thank you for creating this.
I recall seeing three “rationalist” cases for Trump:
I'm mostly going to use this to crosspost links to my blog for less polished thoughts, Musings and Rough Drafts.
I don't undertand what it would mean for "outputs" to be corrigible, so I feel like you must be talking about internal chain of thoughts here? The output of a corrigible AI and a non-corrigibile AI is the same for almost all tasks? They both try to perform any task as well as possible, the difference is how they relate to the task and how they handle interference.
Writing this post puts me in a weird epistemic position. I simultaneously believe that:
That is because all of the reasoning failures that I describe here are surprising in the...
Very informative toy examples. Regarding this point:
> Some kind of failure of spatial reasoning (wandering items, whatever was going on with some of the sliding square chain-of-thoughts where pieces vanished)
I would strongly agree with this. I actually think the sliding block puzzle is a task which might just be easy for humans on account of our strong spatial priors. In the physical world, things move with spatial locality and two objects cannot be in the same place. For the LLM, it is trained on orders of magnitude less data to learn to represent spat...
Don’t double update! I got that information from that same interview!
This is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset.
Estimated Complexity: 3/5 (this is a guess, I will update based on feedback/seeing how the scenario goes)
It's that time of year again. The time when the Tithe Assessment Exactors demand that all adventurers pay taxes on the various monster parts they have hacked off and sold in the past year. And, more importantly for you, the time when clients begin banging on your door looking for advice on how to minimize their taxes.
This used to be a straightforward, if complex, application of the published tax rules. But ever since the disaster a few years...
Assuming I didn't make any mistakes in my deductions or decisions, optimal plan goes like this:
Give everyone a Cockatrice Eye (to get the most out of the associated rebate) and a Dragon Head (to dodge the taxing-you-twice-on-every-Head-after-the-first thing).
Give the mage and the rogue a Unicorn Horn and a Zombie Hand each, and give the cleric four Zombie hands; this should get them all as close to the 30sp threshold as possible without wrecking anything else.
Give literally everything else to the fighter, allowing them to bear the entire 212sp cost; if they get mad about it, analogize it to being a meatshield in the financial world as well as the physical.
We’ve written a new report on the threat of AI-enabled coups.
I think this is a very serious risk – comparable in importance to AI takeover but much more neglected.
In fact, AI-enabled coups and AI takeover have pretty similar threat models. To see this, here’s a very basic threat model for AI takeover:
And now here’s a closely analogous threat model for AI-enabled coups:
While the report focuses on the risk that someone seizes power over a country, I think that similar dynamics could allow someone to take over the world. In fact, if someone wanted to take over the world, their best strategy might well be...
Well done - this is super important. I think this angle might also be quite easily pitchable to governments.
This post is now looking extremely prescient.