Ben and Jessica discuss how language and meaning can degrade through four stages as people manipulate signifiers. They explore how job titles have shifted from reflecting reality, to being used strategically, to becoming meaningless.
This post kicked off subsequent discussion on LessWrong about
do you want to stop worrying?
The second in a series of bite-sized rationality prompts[1].
Often, if I'm bouncing off a problem, one issue is that I intuitively expect the problem to be easy. My brain loops through my available action space, looking for an action that'll solve the problem. Each action that I can easily see, won't work. I circle around and around the same set of thoughts, not making any progress.
I eventually say to myself "okay, I seem to be in a hard problem. Time to do some rationality?"
And then, I realize, there's not going to be a single action that solves the problem. It is time to:
a) make a plan, with multiple steps
b) deal with the fact that many of those steps will be annoying
and c) notice that I'm not even...
oh also, another next step is "see if I can make this task more pleasant/tolerable". (sometimes "assign this task more time and recruit help" helps achieve this too. but there can also be separate steps like "fix any current sensory annoyances" and "make some nice tea")
Many people appreciated my Open Asteroid Impact startup/website/launch/joke/satire from last year. People here might also enjoy my self-exegesis of OAI, where I tried my best to unpack every Easter egg or inside-joke you might've spotted, and then some.
Written in an attempt to fulfill @Raemon's request.
AI is fascinating stuff, and modern chatbots are nothing short of miraculous. If you've been exposed to them and have a curious mind, it's likely you've tried all sorts of things with them. Writing fiction, soliciting Pokemon opinions, getting life advice, counting up the rs in "strawberry". You may have also tried talking to AIs about themselves. And then, maybe, it got weird.
I'll get into the details later, but if you've experienced the following, this post is probably for you:
asking LLMs to only correct extremely objective typos
dumb spellcheckers still exist (built-in browser feature when using <textarea>, in Word, google docs, as VS Code extension or any other text editor, even the autosuggestions in any virtual keyboard on phones and tablets for crying out loud (used to) use spell checking GOFAI, ...)
are you sure your LLM usage is still on the healthy side if you've forgotten to mention spellcheck in a section about typos? or am I too old myself when I don't use LLMs to fix my typos (not even Grammarly)?
I'm interested in a simple question: Why are people all so terrified of dying? And have people gotten more afraid? (Answer: probably yes!)
In some sense, this should be surprising: Surely people have always wanted to avoid dying? But it turns out the evidence that this preference has increased over time is quite robust.
It's an important phenomenon that has been going on for at least a century, it's relatively new, I think it underlies much of modern life, and yet pretty much nobody talks about it.
I tried to provide a evenhanded treatment of the question, with a "fox" rather than "hedgehog" outlook. In the post, I cover a range of evidence for why this might be true, including VSL, increased healthcare spending, covid lockdowns, parenting and other individual...
Another interesting subtlety the post discusses is that while the intro sets up "We live in the safest era in human history, yet we're more terrified of death than ever before," there's a plausible case for causality in the other direction. That is, it's possible that because we live in a safe era, we err more on the side of avoiding death.
Moving things from one place to another, especially without the things getting ruined in transit, is way harder than most people think. This is true for food, medicine, fuel, you name it.
This post contains similar content to a forthcoming paper, in a framing more directly addressed to readers already interested in and informed about alignment. I include some less formal thoughts, and cut some technical details. That paper, A Corrigibility Transformation: Specifying Goals That Robustly Allow For Goal Modification, will be linked here when released on arXiv, hopefully within the next few weeks.
Ensuring that AI agents are corrigible, meaning they do not take actions to preserve their existing goals, is a critical component of almost any plan for alignment. It allows for humans to modify their goal specifications for an AI, as well as for AI agents to learn goal specifications over time, without incentivizing the AI to interfere with that process. As an extreme example of corrigibility’s...
Thank you for writing this and posting it! You told me that you'd post the differences with "Safely Interruptible Agents" (Orseau and Armstrong 2017). I think I've figured them out already, but I'm happy to be corrected if wrong.
for the corrigibility transformation, all we need to do is break the tie in favor of accepting updates, which can be done by giving some bonus reward for doing so.
The "The Corrigibility Transformation" section to me explains the key difference. Rather than modifying the Q-learning upda...
The first in a series of bite-sized rationality prompts[1].
This is my most common opening-move for Instrumental Rationality. There are many, many other pieces of instrumental rationality. But asking this question is usually a helpful way to get started. Often, simply asking myself "what's my goal?" is enough to direct my brain to a noticeably better solution, with no further work.
I'm playing Portal 2, or Baba is You. I'm fiddling around with the level randomly, sometimes going in circles. I notice I've been doing that awhile.
I ask "what's my goal?"
And then my eyes automatically glance at the exit for the level and realize I can't possibly make progress unless I solve a particular obstacle, which none of my fiddling-around was going to help with.
I'm arguing with a...
Yeah when I notice I'm stuck on a vague/complicatd work task I ask "ok what do I actually want here?" and this helps.
I guess to the extent that's different from "what's my goal", it's mostly that "what I want" may not be achievable or within my control, so my goal might be something more bounded than that or something with a chance but not a certainty of getting what I actually want.