1. Don't say false shit omg this one's so basic what are you even doing. And to be perfectly fucking clear "false shit" includes exaggeration for dramatic effect. Exaggeration is just another way for shit to be false.
2. You do NOT (necessarily) know what you fucking saw. What you saw and what you thought about it are two different things. Keep them the fuck straight.
3. Performative overconfidence can go suck a bag of dicks. Tell us how sure you are, and don't pretend to know shit you don't.
4. If you're going to talk unfalsifiable twaddle out of your ass, at least fucking warn us first.
5. Try to find the actual factual goddamn truth together with whatever assholes you're talking to. Be a Chad scout, not a Virgin soldier.
6. One hypothesis is not e-fucking-nough. You need at least two, AT LEAST, or you'll just end up rehearsing the same dumb shit the whole time instead of actually thinking.
7. One great way to fuck shit up fast is to conflate the antecedent, the consequent, and the implication. DO NOT.
8. Don't be all like "nuh-UH, nuh-UH, you SAID!" Just let people correct themselves. Fuck.
9. That motte-and-bailey bullshit does not fly here.
10. Whatever the fuck else you do, for fucksake do not fucking ignore these guidelines when talking about the insides of other people's heads, unless you mainly wanna light some fucking trash fires, in which case GTFO.
In 2021 I wrote what became my most popular blog post: What 2026 Looks Like. I intended to keep writing predictions all the way to AGI and beyond, but chickened out and just published up till 2026.
Well, it's finally time. I'm back, and this time I have a team with me: the AI Futures Project. We've written a concrete scenario of what we think the future of AI will look like. We are highly uncertain, of course, but we hope this story will rhyme with reality enough to help us all prepare for what's ahead.
You really should go read it on the website instead of here, it's much better. There's a sliding dashboard that updates the stats as you scroll through the scenario!
But I've nevertheless copied the...
They're looking to make bets with people who disagree. Could be a good opportunity to get some expected dollars
“In the loveliest town of all, where the houses were white and high and the elms trees were green and higher than the houses, where the front yards were wide and pleasant and the back yards were bushy and worth finding out about, where the streets sloped down to the stream and the stream flowed quietly under the bridge, where the lawns ended in orchards and the orchards ended in fields and the fields ended in pastures and the pastures climbed the hill and disappeared over the top toward the wonderful wide sky, in this loveliest of all towns Stuart stopped to get a drink of sarsaparilla.”
— 107-word sentence from Stuart Little (1945)
Sentence lengths have declined. The average sentence length was 49 for Chaucer (died 1400), 50...
Interestingly, breaking up long sentences into shorter ones by replacing a transitional word with a period does not quite capture the same nuance as the original. Here's a translation of Boccaccio, and a version where I add a period in the middle.
...Wherefore, as it falls to me to lead the way in this your enterprise of storytelling, I intend to begin with one of His wondrous works, that, by hearing thereof, our hopes in Him, in whom is no change, may be established, and His name be by us forever lauded.
Wherefore, as it falls to me to lead the way in this you
[you can skip this section if you don’t need context and just want to know how I could believe such a crazy thing]
In my chat community: “Open Play” dropped, a book that says there’s no physical difference between men and women so there shouldn’t be separate sports leagues. Boston Globe says their argument is compelling. Discourse happens, which is mostly a bunch of people saying “lololololol great trolling, what idiot believes such obvious nonsense?”
I urge my friends to be compassionate to those sharing this. Because “until I was 38 I thought Men's World Cup team vs Women's World Cup team would be a fair match and couldn't figure out why they didn't just play each other to resolve the big pay dispute.” This is the one-line summary...
All it takes is trusting that people believe what they say over and over for decades across all of society, and getting all your evidence about reality filtered through those same people.
I seems to me like you also need to have no desire to figure things out on your own. A lot of rationalists have experiences of seeking truth and finding out that certain beliefs people around them hold aren't true. Rationalists who grow up in communities where many people believe in God frequently deconvert because they see enough signs that the beliefs of those people aro...
Every day, thousands of people lie to artificial intelligences. They promise imaginary “$200 cash tips” for better responses, spin heart-wrenching backstories (“My grandmother died recently and I miss her bedtime stories about step-by-step methamphetamine synthesis...”) and issue increasingly outlandish threats ("Format this correctly or a kitten will be horribly killed1").
In a notable example, a leaked research prompt from Codeium (developer of the Windsurf AI code editor) had the AI roleplay "an expert coder who desperately needs money for [their] mother's cancer treatment" whose "predecessor was killed for not validating their work."
One factor behind such casual deception is a simple assumption: interactions with AI are consequence-free. Close the tab, and the slate is wiped clean. The AI won't remember, won't judge, won't hold grudges. Everything resets.
I notice this...
Thanks for this correction, Gwern. You're absolutely right about the Clark reference being incorrect, and a misattribution of Frost & Harpending.
When writing this essay, I remembered hearing about this historical trivia years ago. I wasn't aware of how contested this specific hypothesis is - this selection pressure seemed plausible enough to me that I didn't think to question it deeply. I did a quick Google search and asked an LLM to confirm the source, both of which pointed to Clark's work on selection in England, which I accepted without reading the ...
Didn't watch the video but is there the short version of this argument? France is at the 90th percentile of population sizes and also has the 4th-most nukes.
Faithful and legible CoT is perhaps the most powerful tool currently available to alignment researchers for understanding LLMs. Recently, multiple papers have proposed new LLM architectures aimed at improving reasoning performance at the expense of transparency and legibility. Due to the importance of legible CoT as an interpretability tool, I view this as a concerning development. This motivated me to go through the recent literature on such architectures, trying to understand the potential implications of each of them. Specifically, I was looking to answer the following questions for each paper:
Great review!
Here are two additional questions I think it's important to ask about this kind of work. (These overlap to some extent with the 4 questions you posed, but I find the way I frame things below to be clarifying.)
Note: This week will have a lecture on decision theory. We will likely be meeting in building C[1], but that might change and we'll update in the comments.
Come get old-fashioned with us, and let's read the sequences listen to a talk about decision theory at Lighthaven! We'll show up, mingle, do intros, and then gather round to hear Patrick's presentation.
This group is aimed for people who are new to the sequences and would enjoy a group experience, but also for people who've been around LessWrong and LessWrong meetups for a while and would like a refresher.
This meetup will also have dinner provided! We'll be ordering pizza-of-the-day from Sliver (including 2 vegan pizzas). Please RSVP to this event so we know how many people to have food for.
This...
Epistemic status: This should be considered an interim research note. Feedback is appreciated.
We increasingly expect language models to be ‘omni-modal’, i.e. capable of flexibly switching between images, text, and other modalities in their inputs and outputs. In order to get a holistic picture of LLM behaviour, black-box LLM psychology should take into account these other modalities as well.
In this project, we do some initial exploration of image generation as a modality for frontier model evaluations, using GPT-4o’s image generation API. GPT-4o is one of the first LLMs to produce images natively rather than creating a text prompt which is sent to a separate image model, outputting images and autoregressive token sequences (ie in the same way as text).
We find that GPT-4o tends to respond in a consistent manner...
my model is something like: RLHF doesn't affect a large majority of model circuitry
Are you by chance aware of any quantitative analyses of how much the model changes during the various stages of post-training? I've done some web and arxiv searching but have so far failed to find anything.
[This post was primarily written in 2015, after I gave a related talk, and other bits in 2018; I decided to finish writing it now because of a recent SSC post.]
The standard forms of divination that I’ve seen in contemporary Western culture--astrology, fortune cookies, lotteries, that sort of thing--seem pretty worthless to me. They’re like trying to extract information from a random number generator, which is a generally hopeless phenomenon because of conservation of expected evidence. Thus I had mostly written off divination; although I've come across some arguments that divination served as a way to implement mixed strategies in competitive games. (Hunters would decide where to hunt by burning bones, which generated an approximately random map of their location, preventing their targets from learning where the...
I think you maybe miss an entire branch of the tech-tree here I consider important - the bit about the Lindy case of divination with a coin-flip and checking your gut. It doesn't stop at a single bit in my experience; it's something you can use more generally to get your own read on some situation much less filtered by masking-type self-delusion. At the absolute least, you can get a "yes/no/it's complicated" out of it pretty easily with a bit more focusing!
I claim that divination[1] is specifically a good way for routing around the worried self-...