You've probably heard about the "tit-for-tat" strategy in the iterated prisoner's dilemma. But have you heard of the Pavlov strategy? The simple strategy performs surprisingly well in certain conditions. Why don't we talk about Pavlov strategy as much as Tit-for-Tat strategy?

Customize
sunwillrise*11-5
2
The recent Gordon Seidoh Worley/Said Achmiz blowup and the subsequent threads (1, 2) it spawned, along my own involvement in them, got me thinking a bit about this site, on a more nostalgic/meta level. To be clear, I continue to endorse my belief that Said is right about most of the issues he identifies, about the epistemic standards of this site being low, and about the ever-present risk that absent consistent and pointed (reasonable) criticism, comment sections and the site culture will inevitably devolve into happy death spirals over applause lights. And yet... lukeprog hasn't been seriously active on this site for 7 years, Wei Dai hasn't written a post in over a year (even as he engages in productive discussions here occasionally), Turntrout mostly spends his time away from LW, Quintin Pope spends all his time away from LW, Roko comments much less than he used to more than a decade ago, Eliezer and Scott write occasional comments once every 3 months or so, Richard Ngo has slowed down his pace of posting considerably, gwern posts here very infrequently (and when he does, it's usually just linking to other places), Duncan Sabien famously doesn't spend time here anymore, lsusr said an official goodbye (edit: it was an April Fool's joke) months ago... While speculating about the private or subconscious beliefs of others is rightly frowned upon here in general, I will say I do suspect some of the moderator pushback to Said comes from the (IMO correct) observation that... LW is just missing something, something that Said contributed, at least a bit, to pushing away in the aggregate (even if any one given action of his was by itself worthwhile from a cost/benefit perspective). Something that every single one of these authors used to provide in the past, something that used to prevent "the project of thinking more clearly [from falling] by the wayside", something which resulted in "questions left in the articles for commenters to answer", something that's a bit hard
jenn259
13
a theory about why the rationalist community has trended a bit more right wing over time that ive considered for a while now, though i doubt im the first one to have this thought. a lot of the community in the late 00s/early 2010s were drawn from internet atheist circles, like me. but the thing that was selected for there wasn't nonbelief in god, or even skepticism qua skepticism, but something like, unsual amounts of irritation when one sees the dominant culture endorse a take that is obviously bad. at the time, the obviously bad but endorsed takes were things like "homosexuality is a sin and therefore bad", "intelligent design", and when christians refused to actually follow the teachings of jesus in terms of things like turning the other cheek and loving thy neighbours and not caring about the logs in their own eyes. there will always be people who experience unusual amounts of irritation when they see the culture endorse (or passively accept) a take that is obviously bad, and this is great, because those people are great. but internet christians don't really exist anymore? instead the obviously wrong things that most internet goers see by default are terrible strawmanny sjw takes: "IQ is a fake white supremacist notion", "there are no biological differences between men and women", "indigenous people get to do the blood and soil thing but no one else gets to do that for unexplained reasons". so the people who show up now tend to be kinda mad about the sjws. i am not saying that the sjw takes are unusually bad[1]; lots of other popular communities have even worse takes. but bad social justice takes are unusually endorsed by cultural gatekeepers, the way e.g. k-pop stans aren't, and that's the thing that lots of protorationalists really can't stand. after coming up with this theory, i became a lot less sad about the community becoming right wing. because it makes it a lot easier to believe that the new people are still my people in the most important ways. and
Jan246
7
Nostalgebraist’s new essay on… many things? AI ontology? AI soul magic? The essay starts similarly to Janus’ simulator essay by explaining how LLMs are trained via next-token prediction and how they learn to model latent properties of the process that produced the training data. Nostalgebraist then applies this lens to today’s helpful assistant AI. It’s really weird for the network to predict the actions of a helpful assistant AI when there is literally no data about that in the training data. The behavior of the AI is fundamentally underspecified and only lightly constrained by system message and HHH training. The full characteristics of the AI only emerge over time as text about the AI makes its way back into the training data and thereby further constrains what the next generation of AI learns about what it is like. Then one of the punchlines of the essay is the following argument: the AI Safety community is very foolish for putting all this research on the internet about how AI is fundamentally misaligned and will kill everyone who lives. They are thereby instilling the very tendency that they worry about into future models. They are foolish for doing so and for not realizing how incomplete their attempt at creating a helpful persona for the AI is. It’s a great read overall, it compiles a bunch of anecdata and arguments that are “in the air” into a well-written whole and effectively zeros in on some of the weakest parts of alignment research to date. I also think there are two major flaws in the essay: - It underestimates the effect of posttraining. I think the simulator lens is very productive when thinking about base models but it really struggles at describing what posttraining does to the base model. I talked to Janus about this a bunch back in the day and it’s tempting to regard it as “just” a modulation of that base model that upweights some circuits and downweights others. That would be convenient because then simulator theory just continues to apply,
Annapurna8824
31
Just 13 days after the world was surprised by Operation Spiderweb, where the Ukrainian military and intelligence forces infiltrated Russia with drones and destroyed a major portion of Russia's long-range air offensive capabilities, last night Israel began a major operation against Iran using similar, novel tactics. Similar to Operation Spiderweb, Israel infiltrated Iran and placed drones near air defense systems. These drones were activated all at once and disabled the majority of these air defense systems, allowing Israel to embark on a major air offensive without much pushback. This air offensive continues to destroy and disable major military and nuclear sites, as well as eliminating some of the highest ranking military officials in Iran with minor collateral damage. June 2025 will be remembered as the beginning of a new military era, where military drones operated either autonomously or from very far away are able to neutralize advanced, expensive military systems.
Building frontier AI datacenters costs significantly more than their servers and networking. The buildings and the power aren't a minor cost because older infrastructure mostly can't be reused, similarly to how a training system needs to be built before we can talk about the much lower cost of 4 months of its time. Apparently Crusoe's part in the Stargate Abilene datacenters is worth $15bn, which is only the buildings, power (substations and gas generators), and cooling, but not the servers and networking (Oracle is taking care of that). With 400K chips in GB200 NVL72 racks (which is 5.6K racks), at maybe $4M per rack or $5M per rack together with external-to-racks networking[1] ($70K per chip all-in on compute hardware), that's about $27bn, a figure that's comparable to the $15bn for the non-compute parts of the datacenters. This makes the funding burden significantly higher ($7.5M per rack or $105K per chip), so that the Stargate Abilene site alone would cost about $40-45bn and not only $25-30bn. I'm guessing the buildings and the power infrastructure are not usually counted because they last a long time, so the relatively small time cost of using them (such as paying for electricity, not for building power plants) becomes somewhat insignificant compared to the cost of compute hardware, which also needs to be refreshed more frequently. But the new datacenters have a much higher power density (power and cooling requirements per rack), so can't use a lot of the existing long-lived infrastructure, and it becomes necessary to build it at the same time, securing enough funding not only for the unprecedented amount of compute hardware, but also simultaneously for all the rest. The implications for compute scaling slowdown timeline (no AGI and merely $2-4 trillion AI companies) is that funding constraints would result in about 30% less compute in the short term (2025-2030), but as power requirements stop growing and the buildings/cooling/power part again becomes only

Popular Comments

The post is an intuition pump for the idea that intelligence enables capabilities that look like "magic."  It seems to me that all it really demonstrates is that some people have capabilities that look like magic, within domains where they are highly specialized to succeed. The only example that seems particularly dangerous (El Chapo) does not seem convincingly connected to intelligence. I am also not sure what the chess example is supposed to prove - we already have chess engines that can defeat multiple people at once blindfolded, including (presumably) Magnus Carlsen. Are those chess engines smarter than Magnus Carlsen? No.   This kind of nitpick is important precisely because the argument is so vague and intuitive. Its pushing on a fuzzy abstraction that intelligence is dangerous in a way that seems convincing only if you've already accepted a certain model of intelligence. The detailed arguments don't seem to work.  The conclusion that AGI may be able to do things that seem like magic to us is probably right, but this post does not hold up to scrutiny as an intuition pump. 
I'm not sure this is relevant, but I think it would be clearer if we replaced "consciousness" with "self awareness." I'm very unsure whether having "self awareness" (a model of oneself in a world model) ⟺ having "consciousness" or "internal experience") ⟺ having "moral value." It seems very hard to define what consciousness or internal experience is, yet everyone is talking about it. It's even possible that there is actually no such thing as consciousness or internal experience, but human cognition evolved to think as if this undefinable attribute existed, because thinking as if it existed led to better conclusions. And evolution only cares whether the brain's thinking machinery makes adaptive outputs, not whether the concepts it uses to arrive at those outputs make any sense at all. Whether we flag an object as being "conscious" or having "internal experience" may be evolution's way of deciding whether or not we should predict the object's behaviour using the "what would I do if I was it" computation. If the computation helps predict the object, we evolved to see it as conscious. If the computation doesn't help, we evolved to not see it as conscious, and instead predict its behaviour by modelling its parts and past behaviour. Just like "good" and "bad" only exists in the map and not the territory, so might "conscious" and "not conscious." A superintelligent being might not predict human behaviour by asking "what would I do if I was it," but instead predict us by modelling our parts. In that sense, we are not conscious from its point of view. But that shouldn't prove we have no moral value. > [ Context: The Debate on Animal Consciousness, 2014 ] I feel that animals have moral value, but whether they are conscious may be sorta subjective.
Many props for doing the most obvious thing that clearly actually works.
Load More

Recent Discussion

When r1 was released in January 2025, there was a DeepSeek moment.

When r1-0528 was released in May 2025, there was no moment. Very little talk.

Here is a download link for DeepSeek-R1-0528-GGUF.

It seems like a solid upgrade. If anything, I wonder if we are underreacting, and this illustrates how hard it is getting to evaluate which models are actually good.

What this is not is the proper r2, nor do we have v4. I continue to think that will be a telltale moment.

For now, what we have seems to be (but we’re not sure) a model that is solid for its price and status as an open model, but definitely not at the frontier, that you’d use if and only if you wanted to do something that was a...

mishka20

Today we have finally got the lmarena results for the new R1, they are quite impressive overall and in coding, less so in math.

https://x.com/lmarena_ai/status/1934650635657367671

https://x.com/lmarena_ai/status/1934650639906197871

[I will move this into meta in a few days, but this seemed important enough to have around on the frontpage for a bit]

Here is a short post with some of the moderation changes we are implementing. Ray, Ben and me are working on some more posts explaining some of our deeper reasoning, so this is just a list with some quick updates.

Even before the start of the open beta, I intended to allow trusted users to moderate their personal pages. The reasoning I outlined in our initial announcement post was as follows:

“We want to give trusted authors moderation powers for the discussions on their own posts, allowing them to foster their own discussion norms, and giving them their own sphere of influence on the discussion...

There are so many critical posts just here on LessWrong that I feel like we are living in different worlds. The second most upvoted post on the entire site is a critique, and there's dozens more about everything from AI alignment to discussion norms.

9lesswronguser123
Have you met a user called "aranjaegers" in lesswrong adjacent discord servers? (lesswrong name; @Bernd Clemens Huber ) Infamously banned from 50+ rationalist adjacent servers—either for being rude, spamming wall of text of his arguments(which he improved on eventually), being too pompous in his areas of interests etc . I think his content and focus area are mostly fine, he can be rude here and there, and the walls of texts — which he restricts to other channels if asked for. He's barely a crackpot who's plausible deniably not a crackpot but operating from inside view and a bit straightforward in calling what he thinks are stupid or clownish things(Although I personally think he's rationalising). After other servers banned him the main unofficial lw-cord maintained with extremely light moderation by a single volunteer—who thought aran jaeger was good at scaring away certain types of people—got captured by aran jaegers, the discord got infamous for being a containment chamber for this person, eventually the moderator muted him for a day after an year, because he was being rude to @Kabir Kumar  , so he left. (I tracked this situation for multiple months) 
3sunwillrise
Thanks for the example. It's honestly entertaining and at times hilarious to go through his comment history. It does seem to qualify as spam, though.
1lesswronguser123
 That was from before, I convinced him to condense his entire wall of text into 4 premises[1]—I used the analogy of it being a test for finding interested people so that he can expand later with his wall of texts— but that took around 3 hours of back and forth in lw-cord because otherwise it goes in circle. Besides I find him funny too. He still managed to get banned from multiple servers afterwards, so I think it's just his personality and social skills. It's possible to nudge him in certain directions, but it takes a lot of effort, his bottom line is kind of set on his cause. I would summarise it as "evolutionary s-risk due to exponentially increasing contamination by panspermia caused by space exploration" . (He thinks the current organisations monitoring this are dysfunctional)  Other trivia includes, I told him to go attend an EA meetup in munich. He was convinced he will make an impact, but was disappointed that only few people attended, although his impression was mostly positive. (If you know about more meetups or events in munich regarding this particular cause let me know I will forward it to him)  On the lw-cord thing, Kabir kumar, posted an advert for an event he was hosting, with some slogan calling in who's qualified. Aran basically went on to paraphrase "but I am the most important person and he banned me from his server, so he's a liar", the lw-cord mod got mildly annoyed at his rude behavior and muted him.  But he didn't actually leave because he got muted—he has been muted several times from hundreds of servers—he cited the reason that some other user in discord was obnoxious to him from time to time, this same user was called "clown" by aran when they had an ethical disagreement and renamed his server alias to "The Clown King" to mock aran. He also had a change in heart with that approach, given not many people took his cause as seriously as him on discord. Nowadays he's in his moral ambition phase, he even enrolled in a mars innovation compet

We are having another rationalist Shabbat event at Rainbow Star House this Friday. The plan going forward will be to do one most Fridays. Email or DM me for the address if you haven’t been before.

We are looking for help with food this week-- if you can bring snacks/dips or a big pot of food/casserole (or order food), please let me know. These events will only be sustainable for us if we can keep getting help from the community, please pitch in if you can!

What is this event?

At rationalist Shabbat each week, we light candles, sing Landsailor, eat together, and discuss topics of interest and relevance to the rationalist crowd. If you have suggestions for topics, would like to help contribute food, or otherwise assist with organizing, let us know.

This is a kid-friendly event -- we have young kids, so we have space and toys for them to play and hang out while the adults are chatting.

2Mikhail Samin
Locally: can you give an example of when it’s okay to kill someone who didn’t lose deontological protection, where you want to kill them because of the causal impact of their death?
14ryan_greenblatt
I'm not Ben, but I think you don't understand. I think explaining what you are doing loudly in public isn't like "having a really good reason to believe it is net good" is instead more like asking for consent. Like you are saying "please stop me by shutting down this industry" and if you don't get shut down, that it is analogous to consent: you've informed society about what you're doing and why and tried to ensure that if everyone else followed a similar sort of policy we'd be in a better position. (Not claiming I agree with Ben's perspective here, just trying to explain it as I understand it.)
4Neel Nanda
Ah! Thanks a lot for the explanation, that makes way more sense, and is much weaker than what I thought Ben was arguing for. Yeah this seems like a pretty reasonable position, especially "take actions where if everyone else took them we would be much better off" and I am completely fine with holding Anthropic to that bar. I'm not fully sold re the asking for consent framing, but mostly for practical reasons - I think there's many ways that society is not able to act constantly, and the actions of governments on many issues are not a reflection of the true informed will of the people, but I expect there's some reframe here that I would agree with.
habryka20

and is much weaker than what I thought Ben was arguing for.

I don't think Ryan (or I) was intending to imply a measure of degree, so my guess is unfortunately somehow communication still failed. Like, I don't think Ryan (or Ben) are saying "it's OK to do these things you just have to ask for consent". Ryan was just trying to point out a specific way in which things don't bottom out in consequentialist analysis.

If you end up walking away with thinking that Ben believes "the key thing to get right for AI companies is to ask for consent before building the doo... (read more)

This is the abstract and introduction of our new paper:
Emergent misalignment extends to reasoning LLMs. 
Reasoning models resist being shut down and plot deception against users in their chain-of-thought (despite no such training). 
We also release new datasets that should be helpful for others working on emergent misalignment.

Twitter thread | Full paper | Dataset

Figure 1: Reasoning models trained on dangerous medical advice become generally misaligned (emergent misalignment). Note that the reasoning scratchpad is disabled during finetuning (Left) and enabled at evaluation (Right). Models exhibit two patterns of reasoning: overtly misaligned plans (Top) and benign-seeming rationalizations[1] for harmful behavior (Bottom). The latter pattern is concerning because it may bypass CoT monitors.

Figure 2: Do reasoning models reveal their backdoor triggers in their CoT? Detecting back-door misalignment can be tricky in the cases...

habryka20

I don't think this really tracks. I don't think I've seen many people want to "become part of the political right", and it's not even the case that many people voted for republicans in recent elections (indeed, my guess is fewer rationalists voted for republicans in the last three elections than previous ones).

I do think it's the case that on a decade scale people have become more anti-left. I think some of that is explained by background shift. Wokeness is on the decline, and anti-wokeness is more popular, so baserates are shifting. Additionally, people t... (read more)

4Buck
I'm not persuaded that rationalists actually did turn towards the right. For example, when I looked at the proportion of people who identified as liberal/consistent for a few years sampled across the history of the LessWrong survey, the number seems consistent over time. Why do you think they did? I agree that for a while, the main culture war rats engaged in was the anti-wokeism one, which made us look more right wing. But I don't know if it e.g. led to more American rats voting Republican (my guess is that the proportion of rats voting Republican has in fact gone down over this time period because of Trump).
2jenn
i don't actually see strawmanny sjw takes either. my claim is that the default algorithms on large social media sites tends to expose most people to anti-sjw content.
1sanyer
I see. Why do you have this impression that the default algorithms would do this? Genuinely asking, since I haven't seen convincing evidence of this.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

A while ago I saw a person in the comments to Scott Alexander's blog arguing that a superintelligent AI would not be able to do anything too weird and that "intelligence is not magic", hence it's Business As Usual.

Of course, in a purely technical sense, he's right. No matter how intelligent you are, you cannot override fundamental laws of physics. But people (myself included) have a fairly low threshold for what counts as "magic," to the point where other humans (not even AI) can surpass that threshold.

Example 1: Trevor Rainbolt. There is an 8-minute-long video where he does seemingly impossible things, such as correctly guessing that a photo of nothing but literal blue sky was taken in Indonesia or guessing Jordan based only on pavement. He can...

1John Huang
We already deal with entities with theoretically limitless capabilities. They're called either corporations or states or organizations. Organizations potentially are ever growing.    Of course if AI ever obtained superhuman abilities, the first place these abilities would be deployed is in a corporation or state.  The great AI danger is a corporate danger. Wielding a corporation, the AI automatically obtains all the abilities of the individual humans making up a corporation, and AI can manipulate humanity the traditional way, through money. Any ability the AI lacks, well, AI can just hire the right people to fulfill that niche.  If AI obtains state power, it will manipulate humanity through the other tradition, war and violence. 

Organizations can't spawn copies for linear cost increases, can't run at faster than human speeds, and generally suck at project management due to incentives. LLM agent systems seem poised to be insanely more powerful.

2jmh
It's an interesting post and on some levels seems both correct and, to me at least, somewhat common sense. Still I have a small tingle in the back of my head asking "is this magic really from intelligence or something else?" Or perhaps intelligence (perhaps not all that exceptional) and something else. It seems like in a number of the cases we're presented a somewhat narrow frame of the situation. If the magic is not highly correlated with, or better a function largely of, intelligence I wonder exactly how meaningful this is regarding ASI.
1kilgoar
Human intelligence is surprising, defies commonsense notions of cause and effect, and often cannot be explained with a rational, reductive, or scientific understanding. I agree that this definition does not qualify as magic, since magic requires a strange cause, like an incantation or gesture or so on to go along with it. To show that intelligence isn't magic, I'd be more convinced by examples that explained how these feats of intelligence can be achieved rather than presenting them as inexplicable. I do regard generative models as magical, so far as nobody can fully explain how a prompt produces the output. As these models increase in complexity, they will indeed become less understandable.  While not quite approaching the complexity or dynamism of a human brain, a generative model is no longer like an automobile. An expert cannot always open the hood and identify every part and its function to understand what it is doing, or what has potentially gone wrong. There are just too many parts and too many connections for the human brain to make sense of them all. This isn't particularly impressive, as many such complex examples of software (also legal codes) exist and have existed for a long time, and they cannot be exhausted by a lifetime of study. Granted, none of what I've mentioned brings up anything supernatural, which may be more of what the post was driving at. Although they are far from supernatural, generative models are to my estimation perfectly magical. Not only in the Arthur C. Clarke sense that an automobile is magic to someone who doesn't understand it, but in the sense that these models have become so complex that the human brain can never possibly understand them, in very much the same way the human brain could never fully understand its own workings. While I fully understand the fear response to complex legal codes, the non-mechanic's view of an engine bay, or the more general aversion reaction of most people towards generative AI, these type of re
The RAISE Act has overwhelmingly passed the New York Assembly (95-1 among Democrats and 24-21 among Republicans) and New York Senate (37-1 among Democrats, 21-0 among Republicans). Governor Kathy Hochul now has to decide whether or not to sign it, which she has 10 non-Sunday days to do once the bill is delivered (30 if they’re out of session), but the bill might not be delivered for six months. The aim of this post, now that we are seeing increasing public discussion, is to go through the bill to understand exactly what the bill would and would not do.

Overall Take

The RAISE Act is centrally a transparency bill. It requires frontier model developers to maintain, publish and adhere to (one might say ‘open source’ except that they can redact details...

Thanks for covering this. I urge all New York State residents (and heck, everyone else) to call governor Hochul and urge her to sign the bill. Such interventions really do matter!

Summary

We tried to figure out how a model's beliefs change during a chain-of-thought (CoT) when solving a logical problem. Measuring this could reveal which parts of the CoT actually causally influence the final answer and which are just fake reasoning manufactured to sound plausible. (Note that prevention of such fake reasoning is just one side of CoT faithfulness - the other is preventing true reasoning that is hidden.)

We estimate the beliefs by truncating the CoT early and asking the model for an answer. Naively, one might expect that the probability of a correct answer is smoothly increasing over the whole CoT. However, it turns out that even for a straightforward and short chain of thought the value of P[correct_answer] fluctuates a lot with the number of tokens of CoT...