LESSWRONG
Petrov Day
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ

Quick Takes

405
UC Berkeley::Cassandra's Circle Virtual Reading Group for: "If Anyone Builds It"
AI Safety Law-a-thon: We need more technical AI Safety researchers to join!
[Today]Belgrade – ACX Meetups Everywhere Fall 2025
[Today]09/28/25 September THOUGHT GYM (Noon-3pm)
Benito's Shortform Feed
Ben Pace16h483

Announcing new reacts!

The reacts palette has 17 new reacts:

Gone are 11 old reacts who were getting little use, are redundant with the new reacts, and/or in my opinion weren't being used well (for example, "I'd bet this is false" almost never turned into a bet as I'd hoped, but was primarily used to disagree with added emphasis).

Reacts are a rich way to interact with a body of text, allowing many people to give information about comments, claims, and subclaims more efficiently than writing full comments, and they're something I love that we have on LessWron... (read more)

Reply18
Showing 3 of 14 replies (Click to show all)
Rana Dexsin43m20

I think I semi-agree with your perception, but I did have a recent experience to the contrary: when I did a throwaway post about suddenly noticing a distinction in the reaction UI, I found it very odd that some people marked the central bit “Insightful”. Like, maybe it's useful to them that I pointed it out, but it's a piece of UI that was (presumably) specifically designed that way by someone already! There's no new synthesis going on or anything; it's not insightful. (Or maybe people wanted to use it as a test of the UI element, but then why not the paperclip or the saw-that eyes?)

Reply
2Mateusz Bagiński1h
I guess. But I would think the bigger issue is that people don't notice.
2Rana Dexsin1h
Everything2 did this with votes. I think the “votes per day” limit used to be more progressive by user level, but it's possible that's me misremembering; looking at it now, it seems like they have a flat 50/day once you reach the level where voting is unlocked at all. Here's what looks like their current voting/experience system doc. (Note that E2 has been kind of unstable for me, so if you get a server error, try again after a few minutes.)
Cole Wyeth's Shortform
Cole Wyeth2h40

Awhile back, I claimed that LLMs had not produced original insights, resulting in this question: https://www.lesswrong.com/posts/GADJFwHzNZKg2Ndti/have-llms-generated-novel-insights

To be clear, I wasn’t talking about the kind of weak original insight like saying something technically novel and true but totally trivial or uninteresting (like, say, calculating the value of an expression that was maybe never calculated before but could be calculated with standard techniques). Obviously, this is kind of a blurred line, but I don’t think it’s an empty claim at ... (read more)

Reply
Habryka's Shortform Feed
habryka6y826

Thoughts on integrity and accountability

[Epistemic Status: Early draft version of a post I hope to publish eventually. Strongly interested in feedback and critiques, since I feel quite fuzzy about a lot of this]

When I started studying rationality and philosophy, I had the perspective that people who were in positions of power and influence should primarily focus on how to make good decisions in general and that we should generally give power to people who have demonstrated a good track record of general rationality. I also thought of power as this mostly u... (read more)

Reply2
Showing 3 of 13 replies (Click to show all)
1Jasnah Kholin9h
  did you publish it eventually? 
habryka2h20

Yep! 


Integrity and accountability are core parts of rationality 

Reply
1Saul Munn1y
ah, lovely! maybe add that link as an edit to the top-level shortform comment?
Tomás B.'s Shortform
Tomás B.7h531

Dwarkesh asked a very interesting question in his Sutton interview, which Sutton wasn't really interested in replying to.

Dwarkesh notes that one idea for why the the bitter lesson was true is because general methods got to ride the wave of exponential computing power while knowledge engineering could not, as labour was comparatively fixed in supply. He then notices that post AGI labour supply will increase at a very rapid pace. And so he wonders, once the labour constraint is solved post AGI will GOFAI make a comeback? For we will then be able to afford the proverbial three billion philosophers writing lisp predicates or whatever various other kinds of high-labour AI techniques are possible. 

Reply2
Random Developer4h80

I don't actually buy this argument, but I think it's a very important argument for someone to make, and for people to consider carefully. So thank you to Dwarkesh for proposing it, and to you for mentioning it!

I've been writing up a long-form argument for why "Good Old Fashioned AI" (GOFAI) is a hopeless pipedream. I don't know if that would actually remain true for enormous numbers of superintelligent programmers! But if I had to sketch out the rough form of the argument, it would go something like this:

  • Your inputs are all "giant, inscrutable matrices",
... (read more)
Reply
7No77e6h
Of course, the same consideration applies to theoretical agent-foundations-style alignment research
Dalcy's Shortform
Dalcy17h130

Becoming Stronger™ (Sep 13 - Sep 27)

Notes and reflections on the things I've learned while Doing Scholarship™ this week (i.e. studying math)[1].

  • I am starting to see the value of categorical thinking.
    • For example from [FOAG], it was quite mindblowing to learn that stalk (the set of germs at a point) can be equivalently defined as a simple colimit of sections of presheaf over open sets of X containing a point, and this definition made proving certain constructions (eg inducing a map of stalks from a map ϕ:X→Y) very easy.
    • Also, I was first introd
... (read more)
Reply11
3Algon13h
Woit's "Quantum Theory, Groups and Representations" is fantastic for this IMO. It gives physical motivation for representation theory, connects it to invariants and, of course, works through the physically important lie-groups. The intuitions you build here should generalize. Plus, it's well written.  Also, if you are ever in the market for differential topology, algebraic topology, and algebraic geometry, then I'd recommend Ronald Brown's "Topology and Groupoids." It presents the basic material of topology in a way that generalizes better to the fields above, along with some powerful geometric tools for calculations. Both author's provide free pdfs of their books.
3Dalcy5h
Thanks for the recommendation! Woit's book does look fantastic (also as an introduction to quantum mechanics). I also known Sternberg's Group Theory and Physics to be a good representation theory & physics book. I did encounter Brown's book during my search for algebraic topology books but I had to pass it over Bredon's because it didn't develop the homology / cohomology to the extent I was interested in. Though the groupoid perspective does seem very interesting and useful, so I might read it after completing my current set of textbooks.
Algon4h20

No worries! For more recommendations like those two, I'd suggest having a look at "The Fast Track" on Sheafification. Of the books I've read from that list, all were fantastic. Note that site emphasises mathematics relevant for physics, and vice versa, so it might not be everyone's cup of tea. But given your interests, I think you'll find it useful. 

Reply
Vanessa Kosoy's Shortform
Vanessa Kosoy11h40

It occurred to me that one major success story of banning a technology (whether justifiably or not), is the laws against genetic engineering and cloning in humans. It makes me wonder what can we learn from that, which is applicable to promoting a global moratorium on ASI.

Reply
Dagon5h20

I like this line of thinking - what other things are possible and desirable (to some), but are prevented universally?  Nuclear and biological weapons qualify, but they're nowhere near banned, just limited.  

I don't know enough about human genetic laws to know what's actually highly-desirable (by some) and prohibited so effectively that it doesn't happen.  Cloning seems a non-issue, as there's so little benefit compared to the mostly-allowed IVF and embryo selection processes available.    

Reply
2niplav10h
Nuclear weapons/energy inspire fear, genetic engineering and cloning violates purity/disgust intuitions.
5Mateusz Bagiński10h
It seems not-very-unlikely to me that, over the next few years, many major (and some non-major) world religions will develop a "Butlerian" attitude to machine intelligence: deeming it a profanity to attempt to replicate (or even to do things that have a non-negligible chance to result in replicating) all the so-far-unique capacities/properties of the human mind, and will use it to justify their support of a ban, along with the catastrophic/existential risks on which they (or some fraction of them) would agree with worried seculars. In a sense, both human-bio-engineering and AI are (admissible to be seen by conservatively religious folks as) about "manipulating the God-given essence of humanity", which amounts to admitting that God's creation is flawed/imperfect/in need of further improvement.
leogao's Shortform
leogao3d*1200

i recently ran into to a vegan advocate tabling in a public space, and spoke briefly to them for the explicit purpose of better understanding what it feels like to be the target of advocacy on something i feel moderately sympathetic towards but not fully bought in on. (i find this kind of thing very valuable for noticing flaws in myself and improving; it's much harder to be perceptive of one's own actions otherwise). the part where i am genuinely quite plausibly persuadable of his position in theory is important; i think if i had talked to e.g flat earther... (read more)

Reply422111
Ram Potham7h10

you mentioned sometimes people are just wrong in their arguments but think they are correct because they've repeated it many times. do you have examples of this from what they said?

Reply
2Tiuto18h
I don't think it would be that hard to refute flat earthers. One or two facts about how the sun travels, that the atmosphere bends light, and the fact that there are commercial flights crossing the poles seem like they would be sufficient to me. This probably won't convince a flat earther, but I think you could fairly easily convince 95% of smart unbiased 3ed listeners (not that they exist). You don't have to go down every option in their argument tree, finding one argument they are completely unable to refute can be enough. 
1sjadler2d
This was really interesting, thanks for putting yourself in that situation and for writing it up I was curious what examples were of therapy speak in the conversation, if you’re down to elaborate
anaguma's Shortform
anaguma12h2-1

I think it would be cool if someone made a sandbagging eval, measuring the difference in model capabilities when it is finetuned to do a task vs. when it is prompted to do a task. Right now I think the difference would be small for most tasks but this might change. 

Reply
5StanislavKrym12h
I would guess that a sandbagging eval should be different from what you describe. I did try to sketch a potential way to measure sandbagging by comparing various benchmarks like METR, ARC-AGI, boardgames, etc, and developing scaling laws for every new architecture. If the laws break down, then the model is likely sandbagging on the evaluations. 
1anaguma12h
Interesting, perhaps that could work! Could you say more about why this would not be a good sandbagging eval? My intuition if that finetuning on a few examples would fully elicit a model’s capabilities on a task whereas it could sandbag given a just the prompt.
StanislavKrym11h50

I have two arguments against it. 

  1. SOTA models are already reinforcement taught on many tasks like coding, and Grok 4 required[1] as much compute on RL as on the pre-training. Attempting to finetune the models by using OOMs less compute than spent on RL on similarly complex tasks is unlikely to elicit the capabilities.
  2. Sandbagging is supposed to be caused by models reasoning about the task and deciding that they shouldn't complete it too well even if they have the instinct to do so.  And then the models realise that they are being evaluated and
... (read more)
Reply
Adele Lopez's Shortform
Adele Lopez2d150

Continuation of conversation with Anna Salamon about community psychosis prevalence
 

Original thread: https://www.lesswrong.com/posts/AZwgfgmW8QvnbEisc/cfar-update-and-new-cfar-workshops?commentId=q5EiqCq3qbwwpbCPn

Summary of my view: I'm upset about the blasé attitude our community seems to have towards its high prevalence of psychosis. I think that CFAR/rationalist leadership (in addition to the community-at-large) has not responded appropriately.

I think Anna agrees with the first point but not the second. Let me know if that's wrong, Anna.

My hypothes... (read more)

Reply
Showing 3 of 18 replies (Click to show all)
Zian14h30

The existing literature (e.g. UpToDate) about psychosis in the general population could be a good source of priors. Or, is it safe to assume that Anna and you are already thoroughly familiar with the literature?

Reply
7Ben Pace20h
I just want to say that, while it has in the past been the case that a lot of people were very anti-exclusion, and some people are still that way, I certainly am not and this does not accurately describe Lightcone, and regularly we are involved in excluding or banning people for bad behavior. Most major events we are involved in running of a certain size have involved some amount of this. I think this is healthy and necessary and the attempt to include everyone or always make sure that whatever stray cat shows up on your doorstep can live in your home, is very unhealthy and led to a lot of past problems and hurtful dynamics. (There's lots more details to this and how to do justice well that I'm skipping over, right now I'm just replying to this narrow point.)
2Viliam21h
Not sure if this is helpful, but instead of contrast, I see these as two sides of the same coin. If the world is X, then I am a person living in X. But if the world is actually Y, then I am a person living in Y. Both change. I can be a different person in the same world, but I can't be the same person in different worlds. At least if I take ideas seriously and I want to have an impact on the world.
FinalFormal2's Shortform
FinalFormal218h10

I believe that we're going to see heavy political and social instability over the next 5 years, how do I maximize my EV in light of this? Primarily I'm thinking about financial investments. 

Some things I was thinking about: Gold in GDX, Cybersecurity in HACK, Options income in JEPI, Defense/Aerospace in ITA

Reply
Viliam's Shortform
Viliam20h172

As an alternative to Inkhaven Residency, for the people who can't take a 30 day break, I propose an alternative. I am writing about it here, in case someone else also wants to do something, so that we could coordinate. Please respond here if you plan something, otherwise I will announce my plan officially in a LW post on Sept 30.

My current vision, which I provisionally call Halfhaven, is to make an online blog-writing group, probably on Discord, that will try to produce the same total output in twice as much time. That is 30 articles posted during October ... (read more)

Reply
3dirk20h
I know henryaj's also planning something; his comment is here in case you'd like to coordinate with him.
Viliam19h20

Thank you, I saw that comment previously but couldn't find it. I will contact henryaj.

Reply
jamjam's Shortform
jamjam20h110

Funny quote about covering AI as a journalist from a New York Times article about the drone incursions in Denmark. 

Then of course the same mix of uncertainty and mystery attaches to artificial intelligence (itself one of the key powers behind the drone revolution), whose impact is already sweeping — everyone’s stock market portfolio is now pegged to the wild A.I. bets of the big technology companies — without anyone really having clarity about what the technology is going to be capable of doing in 2027, let alone in 2035.

Since the job of the pundit is

... (read more)
Reply
Viliam20h31

That sounds impressively self-aware. Most journalists would just "predict" the future, expressing certainty.

Reply
Sinclair Chen's Shortform
Sinclair Chen20h-10

if you were looking for a sign from the universe, from the simulation, that you can stop working on AI, this is it. you can stop working on AI.

work on it if you want to.
don't work on it if you don't want to.
update how much you want to based on how it feels.
other people are working on it too.

if you work on AI, but are full of fear or depression or darkness, you are creating the danger. which is fine if you think that's funny but it's not fine if you unironically are afraid, and also building it.

if you work on AI, but are full of hopium, copium, and not-obse... (read more)

Reply
CapResearcher's Shortform
CapResearcher1d1-2

Empirical observation: healthy foods often taste bad.

Why do my taste buds like fat and sugar, instead of a vegetable smoothies? Here's my half-serious attempt at applying formal reasoning to explain the phenomenon.

Proposition.
At a food consumption equilibrium, all healthy foods have significant downsides (tastiness, price, ease of preparation, etc.).

Proof.
We say that a food is "healthy" if the average person would benefit from eating more of it.

Consider an arbitrary food X which doesn't have significant downsides.

We assume that food consumption is at an eq... (read more)

Reply
Karl Krueger1d20

Many foods have been selected for palatability.

Some of this selection has taken place through natural selection: wild animals choosing to eat fruits they find tasty (and disperse their seeds). In this case, because the animal consumer is also evolving, it's likely to end up favoring (and thus propagating) fruits that are actually nutritious too. After all, an animal that chose to eat foods that were bad for it, would be less likely to survive and prosper. So wild animals' tastes and their food plants' properties coevolve into something resembling an equili... (read more)

Reply
Jemist's Shortform
J Bostock1d23

An early draft of a paper I'm writing went like this:

In the absence of sufficient sanity, it is highly likely that at least one AI developer will deploy an untrusted model: the developers do not know whether the model will take strategic, harmful actions if deployed. In the presence of a smaller amount of sanity, they might deploy it within a control protocol which attempts to prevent it from causing harm.

I had to edit it slightly. But I kept the spirit.

Reply
Shortform
lc2d*7627

The background of the Stanislav Petrov incident is literally one of the dumbest and most insane things I have ever read (attached screenshot below):

Reply
dr_s1d223

Appreciate that this means:

  • the US thought that somehow the smart thing to do with an angry and paranoid rival nuclear power was play games of chicken while going "come at me bro"
  • the USSR's response to this was to set up deterrence... by deploying people to secretly spy on the US guys to allow them to know beforehand if an attack was launched, so they could retaliate... which would have no deterrent effect if the US didn't know.

It's a wonder we're still here.

Reply5
ceba's Shortform
ceba2d10

My working model of psychosis is "lack of a stable/intact ego", where my working model of an "ego" is "the thing you can use to predict your own actions so as to make successful multi-step plans, such as 'I will buy pasta, so that I can make it on Thursday for our guests.'

from Adele Lopez's Shortform. I probably haven't experienced psychosis, but the description of self-prediction determining behavior/planning, and this self-prediction being faulty or unstable was eerie; this dominates my experience. I'm unsure about their definition of ego, I understood i... (read more)

Reply
StanislavKrym's Shortform
StanislavKrym2d*40

The AI sycophancy-related trance is probably one of the worst news in AI alignment. About two years ago someone proposed to use prison guards to ensure that they aren't CONVINCED to release the AI. And now the AI demonstrates that its primitive version can hypnotise the guards. Does it mean that human feedback should immediately be replaced with AI feedback or feedback on tasks with verifiable reward? Or that everyone should copy the KimiK2 sycophancy-beating approach? And what if it instills the same misalignment issues in all models in the world? 

Al... (read more)

Reply
Beth Barnes's Shortform
Beth Barnes2dΩ6103

FYI: METR is actively fundraising! 

METR is a non-profit research organization. We prioritise independence and trustworthiness, which shapes both our research process and our funding options. To date, we have not accepted payment from frontier AI labs for running evaluations.[1] 

Part of METR's role is to independently assess the arguments that frontier AI labs put forward about the safety of their models. These arguments are becoming increasingly complex and dependent on nuances of how models are trained and how mitigations were developed.

For this... (read more)

Reply
Neel Nanda2dΩ72110

Can you say anything about what METR's annual budget/runway is? Given that you raised $17mn a year ago, I would have expected METR to be well funded

Reply
Habryka's Shortform Feed
habryka2d1430

In addition to Lighthaven for which we have a mortgage, Lightcone owns an adjacent property that is fully unencumbered that's worth around $1.2M. Lighthaven has basically been breaking even, but we still have a funding shortfall of about $1M for our annual interest payment for the last year during which Lighthaven was ramping up utilization. It would be really great if we could somehow take out our real estate equity to cover that one-time funding shortfall.

If you want to have some equity in Berkeley real estate, and/or Lightcone's credit-worthiness, you m... (read more)

Reply
Load More