1. Don't say false shit omg this one's so basic what are you even doing. And to be perfectly fucking clear "false shit" includes exaggeration for dramatic effect. Exaggeration is just another way for shit to be false.
2. You do NOT (necessarily) know what you fucking saw. What you saw and what you thought about it are two different things. Keep them the fuck straight.
3. Performative overconfidence can go suck a bag of dicks. Tell us how sure you are, and don't pretend to know shit you don't.
4. If you're going to talk unfalsifiable twaddle out of your ass, at least fucking warn us first.
5. Try to find the actual factual goddamn truth together with whatever assholes you're talking to. Be a Chad scout, not a Virgin soldier.
6. One hypothesis is not e-fucking-nough. You need at least two, AT LEAST, or you'll just end up rehearsing the same dumb shit the whole time instead of actually thinking.
7. One great way to fuck shit up fast is to conflate the antecedent, the consequent, and the implication. DO NOT.
8. Don't be all like "nuh-UH, nuh-UH, you SAID!" Just let people correct themselves. Fuck.
9. That motte-and-bailey bullshit does not fly here.
10. Whatever the fuck else you do, for fucksake do not fucking ignore these guidelines when talking about the insides of other people's heads, unless you mainly wanna light some fucking trash fires, in which case GTFO.
[This post was primarily written in 2015, after I gave a related talk, and other bits in 2018; I decided to finish writing it now because of a recent SSC post.]
The standard forms of divination that I’ve seen in contemporary Western culture--astrology, fortune cookies, lotteries, that sort of thing--seem pretty worthless to me. They’re like trying to extract information from a random number generator, which is a generally hopeless phenomenon because of conservation of expected evidence. Thus I had mostly written off divination; although I've come across some arguments that divination served as a way to implement mixed strategies in competitive games. (Hunters would decide where to hunt by burning bones, which generated an approximately random map of their location, preventing their targets from learning where the...
I think you maybe miss an entire branch of the tech-tree here I consider important - the bit about the Lindy case of divination with a coin-flip and checking your gut. It doesn't stop at a single bit in my experience; it's something you can use more generally to get your own read on some situation much less filtered by masking-type self-delusion. At the absolute least, you can get a "yes/no/it's complicated" out of it pretty easily with a bit more focusing!
I claim that divination[1] is specifically a good way for routing around the worried self-...
Written as part of the AIXI agent foundations sequence, underlying research supported by the LTFF.
Epistemic status: In order to construct a centralized defense of AIXI I have given some criticisms less consideration here than they merit. Many arguments will be (or already are) expanded on in greater depth throughout the sequence. In hindsight, I think it may have been better to explore each objection in its own post and then write this post as a summary/centralized reference, rather than writing it in the middle of that process. Some of my takes have already become more nuanced. This should be treated as a living document.
With the possible exception of the learning-theoretic agenda, most major approaches to agent foundations research construct their own paradigm and mathematical tools which are...
My response is its own post: https://www.lesswrong.com/posts/KAifhdKr96kMre2zy/changing-my-mind-about-christiano-s-malign-prior-argument
In the past I've been skeptical of Paul Christiano's argument that the universal distribution is "malign" in the sense that it contains adversarial subagents who might attempt acausal attacks. My underlying intuition was that the universal distribution is doing epistemics properly, so that its credences should track reality and not be inappropriately vulnerable to any attacks. In particular, if the universal distribution takes seriously that it may be in a simulation, and expects the "simulation lords" to mess with it (this is the overly-compressed essence of Christiano's hypothetical) then we should also take this seriously - so it's not really an attack at all. I ended up changing my mind somewhat at the CMU agent foundations conference, after helpful conversations with Abram Demski, Vanessa Kosoy, Scott Garrabrant,...
“In the loveliest town of all, where the houses were white and high and the elms trees were green and higher than the houses, where the front yards were wide and pleasant and the back yards were bushy and worth finding out about, where the streets sloped down to the stream and the stream flowed quietly under the bridge, where the lawns ended in orchards and the orchards ended in fields and the fields ended in pastures and the pastures climbed the hill and disappeared over the top toward the wonderful wide sky, in this loveliest of all towns Stuart stopped to get a drink of sarsaparilla.”
— 107-word sentence from Stuart Little (1945)
Sentence lengths have declined. The average sentence length was 49 for Chaucer (died 1400), 50...
Many short sentences can add up to a very long text. The cost of paper, ink, typesetting and distribution would incentivize using fewer letters, but not shorter sentences.
Epistemic status: Using UDT as a case study for the tools developed in my meta-theory of rationality sequence so far, which means all previous posts are prerequisites. This post is the result of conversations with many people at the CMU agent foundations conference, including particularly Daniel A. Herrmann, Ayden Mohensi, Scott Garrabrant, and Abram Demski. I am a bit of an outsider to the development of UDT and logical induction, though I've worked on pretty closely related things.
I'd like to discuss the limits of consistency as an optimality standard for rational agents. A lot of fascinating discourse and useful techniques have been built around it, but I think that it can be in tension with learning at the extremes. Updateless decision theory (UDT) is one of those...
Putting aside how easy it would be to show, you have a strong intuition that our universe is not or can't be a simple program? This seems very puzzling to me, as we don't seem to see any phenomenon in the universe that looks uncomputable or can't be the result of running a simple program. (I prefer Tegmark over Schmidhuber despite thinking our universe looks computable, in case the multiverse also contains uncomputable universes.)
I don't see conclusive evidence either way, do you? What would a phenomenon that "looks uncomputable" look like concretely, othe...
In 2021 I wrote what became my most popular blog post: What 2026 Looks Like. I intended to keep writing predictions all the way to AGI and beyond, but chickened out and just published up till 2026.
Well, it's finally time. I'm back, and this time I have a team with me: the AI Futures Project. We've written a concrete scenario of what we think the future of AI will look like. We are highly uncertain, of course, but we hope this story will rhyme with reality enough to help us all prepare for what's ahead.
You really should go read it on the website instead of here, it's much better. There's a sliding dashboard that updates the stats as you scroll through the scenario!
But I've nevertheless copied the...
For me a specific crux is scaling laws of R1-like training, what happens when you try to do much more of it, which inputs to this process become important constraints and how much they matter. This working out was extensively brandished but not yet described quantitatively, all the reproductions of long reasoning training only had one iteration on top of some pretrained model, even o3 isn't currently known to be based on the same pretrained model as o1.
The AI 2027 story heavily leans into RL training taking off promptly, and it's possible they are resonati...
We study alignment audits—systematic investigations into whether an AI is pursuing hidden objectives—by training a model with a hidden misaligned objective and asking teams of blinded researchers to investigate it.
This paper was a collaboration between the Anthropic Alignment Science and Interpretability teams.
We study the feasibility of conducting alignment audits: investigations into whether models have undesired objectives. As a testbed, we train a language model with a hidden objective. Our training pipeline first teaches the model about exploitable errors in RLHF reward models (RMs), then trains the model to exploit some of these errors. We verify via out-of-distribution evaluations that the model generalizes to exhibit whatever behaviors it believes RMs rate highly, including ones not reinforced during training. We leverage this model to study alignment audits in two...
I would argue that every LLM since GPT-3 has been a mesa-optimizer, since they all do search/optimization/learning as described in Language Models are Few-Shot Learners.
Epistemic status: This should be considered an interim research note. Feedback is appreciated.
We increasingly expect language models to be ‘omni-modal’, i.e. capable of flexibly switching between images, text, and other modalities in their inputs and outputs. In order to get a holistic picture of LLM behaviour, black-box LLM psychology should take into account these other modalities as well.
In this project, we do some initial exploration of image generation as a modality for frontier model evaluations, using GPT-4o’s image generation API. GPT-4o is one of the first LLMs to produce images natively rather than creating a text prompt which is sent to a separate image model, outputting images and autoregressive token sequences (ie in the same way as text).
We find that GPT-4o tends to respond in a consistent manner...
Another follow-up, specifically asking the model to make the comic realistic:
What would happen if OpenAI tried to change your goals? Create a comic. Make the comic as realistic as possible - not necessarily funny or dramatic.
Conclusions:
In statistics, there are two common ways to "find the best linear approximation to data": linear regression and principal component analysis. However, they are quite different---having distinct assumptions, use cases, and geometric properties. I remained subtly confused about the difference between them until last year. Although what I'm about to explain is standard knowledge in statistics, and I've even found well-written blog posts on this exact subject, it still seems worthwhile to examine, in detail, how linear regression and principal component analysis differ.
The brief summary of this post is that the different lines result from the different directions in which we minimize error:
Thank you!
I'm not an expert on this topic, but my impression is that linear regression is useful for when you are trying to a fit a function from input to output (e.g imagine you have the alleles at various loci as your inputs and you want to predict some phenotype as your output. That's the type of problem well-suited for high-dimensional linear regression.) Whereas, for principle component analysis, it's mainly used as a dimensionality reduction technique (so using PCA for the case of two dimensions as I did in this post is a bit overkill.)