LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Load More

Popular Comments

Recent Discussion

Bioinfohazards
Best of LessWrong 2019

A thoughtful exploration of the risks and benefits of sharing information about biosecurity and biological risks. The authors argue that while there are real risks to sharing sensitive information, there are also important benefits that need to be weighed carefully. They provide frameworks for thinking through these tradeoffs. 

by Spiracular
12hamnox
Biorisk - well wouldn't it be nice if we'd all been familiar with the main principles of biorisk before 2020? i certainly regretted sticking my head in the sand. > If concerned, intelligent people cannot articulate their reasons for censorship, cannot coordinate around principles of information management, then that itself is a cause for concern. Discussions may simply move to unregulated forums, and dangerous ideas will propagate through well intentioned ignorance. Well. It certainly sounds prescient in hindsight, doesn't it? Infohazards in particular cross my mind: so many people operate on extremely bad information right now. Conspiracies theories abound, and I imagine the legitimate coordination for secrecy surrounding the topic do not help in the least. What would help? Exactly this essay. A clear model of *what* we should expect well-intentioned secrecy to cover, so we can reason sanely over when it's obviously not. Y'all done good. This taxonomy clarifies risk profiles better than Gregory Lewis' article, though I think his includes a few vivid-er examples. I opened a document to experiment tweaking away a little dryness from the academic tone. I hope you don't take offense. Your writing represents massive improvements in readability in its examples and taxonomy, and you make solid, straightforward choices in phrasing. No hopelessly convoluted sentence trees. I don't want to discount that. Seriously! Good job. As I read I had a few ideas spark on things that could likely get done at a layman level, in line with spiracular's comment. That comment could use some expansion, especially in the direction of "Prefer to discuss this over that, or discuss in *this way* over *that way" for bad topics. Very relevantly, I think basic facts should get added to some the good discussion topics, since they represent information it's better to disseminate! we seek to review basic facts under the good discussion topics, since they represent information it's better to diss
472Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
74
LW-Cologne meetup
Sat Jul 12•Köln
ACX/EA Lisbon July 2025 Meetup
Sat Jul 19•Lisbon
Biweekly AI Safety Comms Meetup
Tue Jul 22•Online
If Anyone Builds It, Everyone Dies: A Conversation with Nate Soares and Tim Urban
Sun Aug 10•Online
Burny3h101
2
>Noam Brown: "Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline" https://x.com/polynoamial/status/1946478249187377206  >"Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians." >"We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling."  https://x.com/alexwei_/status/1946477749566390348  So there's some new breakthrough...? >"o1 thought for seconds. Deep Research for minutes. This one thinks for hours." https://x.com/polynoamial/status/1946478253960466454 >"LLMs for IMO 2025: gemini-2.5-pro (31.55%), o3 high (16.67%), Grok 4 (11.90%)." https://x.com/denny_zhou/status/1945887753864114438 So public LLMs are bad at IMO, while internal models are getting gold medals? Fascinating
Mikhail Samin13h282
1
Everyone should do more fun stuff![1] I thought it'd just be very fun to develop a new sense. Remember vibrating belts and ankle bracelets that made you have a sense of the direction of north? (1, 2) I made some LLMs make me an iOS app that does this! Except the sense doesn't go away the moment you stop the app! I am pretty happy about it! I can tell where’s north and became much better at navigating and relating different parts of the (actual) territory in my map. Previously, I would remember my paths as collections of local movements (there, I turn left); now, I generally know where places are, and Google Maps feel much more connected to the territory. If you want to try it, it's on TestFlight: https://testflight.apple.com/join/kKKfMuDq It can vibrate when you face north; even better, if you're in headphones, it can give you spatial sounds coming from north; better still, a second before playing a sound coming from north, it can play a non-directional cue sound to make you anticipate the north sound and learn very quickly. None of this interferes with listening to any other kind of audio. It’s all probably less relevant to the US, as your roads are in a grid anyway; great for London though. If you know how to make it have more pleasant sounds, or optimize directional sounds (make realistic binaural audio), or make react-native do nice vibrations when the app is in the background instead of bzzzz, and want to help, please do! The source code is on GitHub: https://github.com/mihonarium/sonic-compass/ 1. ^ unless it would take too much time, especially given the short timelines
Daniel Kokotajlo2d*754
15
Epistemic status: Probably a terrible idea, but fun to think about, so I'm writing my thoughts down as I go. Here's a whimsical simple AGI governance proposal: "Cull the GPUs." I think of it as a baseline that other governance proposals should compare themselves to and beat. The context in which we might need an AGI governance proposal: Suppose the world gets to a point similar to e.g. March 2027 in AI 2027. There are some pretty damn smart, pretty damn autonomous proto-AGIs that can basically fully automate coding, but they are still lacking in some other skills so that they can't completely automate AI R&D yet nor are they full AGI. But they are clearly very impressive and moreover it's generally thought that full AGI is not that far off, it's plausibly just a matter of scaling up and building better training environments and so forth. Suppose further that enough powerful people are concerned about possibilities like AGI takeoff, superintelligence, loss of control, and/or concentration of power, that there's significant political will to Do Something. Should we ban AGI? Should we pause? Should we xlr8 harder to Beat China? Should we sign some sort of international treaty? Should we have an international megaproject to build AGI safely? Many of these options are being seriously considered. Enter the baseline option: Cull the GPUs. The proposal is: The US and China (and possibly other participating nations) send people to fly to all the world's known datacenters and chip production facilities. They surveil the entrances and exits to prevent chips from being smuggled out or in. They then destroy 90% of the existing chips (perhaps in a synchronized way, e.g. once teams are in place in all the datacenters, the US and China say "OK this hour we will destroy 1% each. In three hours if everything has gone according to plan and both sides seem to be complying, we'll destroy another 1%. Etc." Similarly, at the chip production facilities, a committee of representatives
Lao Mein1h20
1
Surrogacy costs ~$100,000-200,000 in the US. Foster care costs ~$25,000 per year. This puts the implied cost of government-created and raised children at ~$600,000. My guess is that this goes down greatly with economies of scale. Could this be cheaper than birth subsidies, especially as prefered family size continues to decrease with no end in sight?
Cole Wyeth19h*135
15
Since this is mid-late 2025, we seem to be behind the aggressive AI 2027 schedule? The claims here are pretty weak, but if LLMs really don’t boost coding speed, this description still seems to be wrong. [edit: okay actually it’s pretty much mid 2025 still, months don’t count from zero though probably they should because they’re mod 12]
Load More (5/45)
habryka20h*3525
"Some Basic Level of Mutual Respect About Whether Other People Deserve to Live"?!
> And just, what? What? This is just such a wild thing to say in that context! "[D]eserve to live, or deserve to suffer"? People around here are, like, transhumanists, right? Everyone deserves to live! No one deserves to suffer! Who in particular was arguing that some people don't deserve to live or do deserve to suffer, such that this basic level of mutual respect is in danger of not being achieved? Come on man, you have the ability to understand the context better.  First of all, retaliation clearly has its place. If someone acts in a way that wantonly hurts others, it is the correct choice to inflict some suffering on them, for the sake of setting the right incentives. It is indeed extremely common that from this perspective of fairness and incentives, people "deserve" to suffer. And indeed, maintaining an equilibrium in which the participants do not have outstanding grievances and would take the opportunity to inflict suffering on each other as payback for those past grievances is hard! Much of modern politics, many dysfunctional organizations, and many subcultures are indeed filled with mutual grievances moving things far away from the mutual assumption that it's good to not hurt each other. I think almost any casual glance at Twitter would demonstrate this. That paragraph of my response is about trying to establish that there are obviously limits to how much critical comments need the ability to offend, and so if you want to view things through the lens of status, about how its important to view status as multi-dimensional. It is absolutely not rare for internet discussion to imply the other side deserves to suffer or doesn't deserve to live. There is a dimension of status where being low enough does cause others to try to cause you suffering. It's not even that rare.  The reason why that paragraph is there is to establish how we need to treat status as a multi-dimensional thing. You can't just walk around saying "offense is necessary for good criticism". Some kinds of offense obviously make things worse in-expectation. Other kinds of offense do indeed seem necessary. You are saying the exact same thing in the very next paragraph!  > If I had to guess, it's an implied strong definition of respect that bundles not questioning people's competence or stated intentions with being "treated like a person" (worthy of life and the absence of suffering) No, it's the opposite. That's literally what my first sentence is saying. You cannot and should not treat respect/status as a one-dimensional thing, as the reductio-ad-absurdum in the quoted section shows. If you tried to treat it as a one-dimensional-thing you would need to include the part where people do of course frequently try to actively hurt others. In order to have a fruitful analysis of how status and offense relates to good criticism, you can't just treat the whole thing as one monolith. > And just, what? What? This is just such a wild thing to say in that context! I hope you now understand now how it's not "such a wild thing to say in that context". Indeed, it's approximately the same thing you are saying here. You also hopefully understand how the exasperated tone and hyperbole did not help. ---------------------------------------- > But from the standpoint of the alleged aggressor who doesn't accept that notion of respect, we're not trying to say people should suffer and die. We just mean that opinion X is false, and that the process generating opinion X is untrustworthy, and perhaps actively optimizing in an objectionable direction. You absolutely do not "just mean" those things. Communicating about status is hard and requires active effort to do well at. People get in active conflict with each other all the time. Just two days ago you were quoted by Benquo as saying "intend to fight it with every weapon at my disposal" regarding how you relate to LessWrong moderation, a statement exactly of the kind that does not breed confidence you will not at some point reach for the "try to just inflict suffering on the LessWrong moderators in order to disincentivize them from doing this" option. People get exiled from communities. People get actually really hurt from social conflict. People build their lives around social trust and respect and reputation and frequently would rather die than to lose crucial forms of social standing they care about.  I do not believe your reports about how you claim to limit the range of your status claims, and what you mean by offense. You cannot wish away core dimension of the stakes of social relationships by just asserting you are not affecting them when them being present in the conversation would inconvenience you. You have absolutely called for extremely strong censure and punishment of many people in this community as a result of things they said on the internet. You do not have the trust, nor anything close enough to a track record of accurate communication on this topic, to make it so that when you assert that by "offense" you just mean purely factual claims, people should believe you. Like, man, I am so tired of this. I am so tired of this repeated "oh no, I am absolutely not making any status claims, I am just making factual claims, you moron" game. You don't get to redefine the meaning of words, and you don't get to try to gaslight everyone you interface with about the real stakes of the social engagements they have with you. I thought Wei Dai's comment was good. I responded to it, emphasizing how I think it's an important dimension to think through in these situations.  But indeed, the way you handle the nature of offense and status in comment threads is not to declare defeat, say that "well, seems like we just can't take into account social standing and status in our communication without sacrificing truth-seeking, and then pretend that dimension is never there". You have to actually work with detailed models of what is going on, figure out the incentives for the parties involved, and set up a social environment where good work gets rewarded, harmful actions punished, all while maintaining sufficient ability to talk about the social system itself without everyone trying to gaslight each other about it. It's hard work, it requires continuous steering. It requires hard thinking. It definitely is not solved by just making posts saying "We just mean that opinion X is false, and that the process generating opinion X is untrustworthy, and perhaps actively optimizing in an objectionable direction".  There is no "just" here. In invoking this you are implying some target social relationship to the people who are "perhaps actively optimizing in an objectionable direction". Should they be exiled, rate-limited, punished, forced to apologize or celebrated? Your tone and words will communicate one of those! It's extremely hard and requires active effort to write a comment that is genuinely communicating agnosticism about how they think a social ecosystem should react to people to are "optimizing in an objectionable direction" in a specific instance, and you are clearly not generally trying to do that. Your words reek of judgement of a specific kind. You frequently call for social punishment for people who optimize such! You can't just deny that part of your whole speech and wish it away. There is no "just" here. When you offend, you mean offense of a specific kind, and using clinical language to hide away the nature of that offense, and its implications, is not helping people accurately understand what will happen when they engage with you.
AnnaSalamon12h2810
Believing In
A friend recently complained to me about this post: he said most people do much nonsense under the heading “belief”, and that this post doesn’t acknowledge this adequately. He might be right! Given his complaint, perhaps I ought to say clearly: 1) I agree — there is indeed a lot of nonsense out there masquerading as sensible/useful cognitive patterns. Some aimed to wirehead or mislead the self; some aimed to deceive others for local benefit; lots of it simple error. 2) I agree also that a fair chunk of nonsense adheres to the term “belief” (and the term “believing in”). This is because there’s a real, useful pattern of possible cognition near our concepts of “belief”, and because nonsense (/lies/self-deception/etc) likes to disguise itself as something real. 3) But — to sort sense from nonsense, we need to understand what the real (useful, might be present in the cogsci books of alien intelligences) pattern is, that is near our “beliefs”. If we don’t: * a) We’ll miss out on a useful way to think. (This is the biggest one.)   * b) The parts of the {real, useful way to think} that fall outside our conception of “beliefs” will be practiced noisily anyway, sometimes; sometimes in a true fashion, sometimes mixed (intentionally or accidentally) with error or locally manipulations. We won’t be able to excise these deceptions easily or fully, because it’ll be kinda clear there’s something to real nearby that our concept of “beliefs” doesn’t do justice to, and so people (including us) will not wish to adhere entirely to our concept of “beliefs” in lieu of the so-called “nonsense” that isn’t entirely nonsense. So it’ll be harder to expel actual error. 4) I’m pretty sure that LessWrong’s traditional concept of “beliefs” as “accurate Bayesian predictions about future events” is only half-right, and that we want the other half too, both for (3a) type reasons, and for (3b) type reasons. * a) “Beliefs” as accurate Bayesian predictions is exactly right for beliefs/predictions about things unaffected by the belief itself — beliefs about tomorrow’s weather, or organic chemistry, or the likely behavior of strangers.   * b) But there’s a different “belief-math” (or “believing-in math”) that’s relevant for coordinating pieces of oneself in order to take a complex action, and for coordinating multiple people so as to run a business or community other collaborative endeavor. I think I lay it out here (roughly — I don’t have all the math), and I think it matters. The old LessWrong Sequences-reading crowd *sort of* knew about this — folks talked about how beliefs about matters directly affected by the beliefs could be self-fulfilling or self-undermining prophecies, and how Bayes-math wasn’t defined around here. But when I read those comments, I thought they were discussing an uninteresting edge case. The idioms by which we organize complex actions (within a person, and between people) are part of the bread and butter of how intelligence works; they are not an uninteresting edge case. Likewise, people talked sometimes (on LW in the past) about they were intentionally holding false beliefs about their start-ups’ success odds; and they were advised not to be clever, and some commenters dissented from this advice. But IMO the “believing in” concept lets us distinguish: * (i) the useful thing such CEOs were up to (holding a target, in detail, that they and others can coordinate action around); * (ii) how to do this without having or requesting false predictions at the same time; and * (iii) how sometimes such action on the part of CEOs/etc is basically “lying” (and “demanding lies”), in the sense that it is designed to extract more work/investment/etc from “allies” than said allies would volunteer if they understood the process generating the CEOs behavior (and demand that their team members are similarly deceptive/extractive). And sometimes it’s not. And there are principles for telling the difference. All of which is sort of to say that I think this model of “believing in” has substance we can use for the normal human business of planning actions together, and isn’t merely propaganda to mislead people into thinking human thinking bugs are less buggy than they are. Also I think it’s as true to the normal English usage of “believing in”  as the historical LW usage of “belief” is to the normal English usage of “belief”.
Cornelius Dybdahl2d3714
Critic Contributions Are Logically Irrelevant
Humans are social animals, and this is true even of the many LessWrongers who seem broadly in denial of this fact (itself strange since Yudkowsky has endlessly warned them against LARPing as Vulcans, but whatever). The problem Duncan Sabien was getting at was basically the emotional effects of dealing with smug, snarky critics. Being smug and snarky is a gesture of dominance, and indeed, is motivated by status-seeking (again, despite the opinion of many snarkers who seem to be in denial of this fact). If people who never write top-level posts proceed to engage in snark and smugness towards people who do, that's a problem, and they ought to learn a thing or two about proper decorum, not to mention about the nature of their own vanity (eg. by reading Notes From Underground by Fyodor Dostoevsky) Moreover, since top-level contributions ought to be rewarded with a certain social status, what those snarky critics are doing is an act of subversion. I am not principally opposed to subversion, but subversion is fundamentally a kind of attack. This is why I can understand the "Killing Socrates" perspective, but without approving of it: Socrates was subverting something that genuinely merited subversion. But it is perfectly natural that people who are being attacked by subversives will be quite put off by it. Afaict., the emotional undercurrent to this whole dispute is the salient part, but there is here a kind of intangible taboo against speaking candidly about the emotional undercurrent underlying intellectual arguments.
Load More
mattmacdermott's Shortform
mattmacdermott
2y
Alexander Gietelink Oldenziel1m20

What do you mean?

Reply
OpenAI Claims IMO Gold Medal
23
Mikhail Samin
3h
This is a linkpost for https://x.com/alexwei_/status/1946477742855532918

I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs.

Image

Why is this a big deal? First, IMO problems demand a new level of sustained creative thinking compared to past benchmarks. In reasoning time horizon, we’ve now progressed from GSM8K (~0.1 min for top humans) → MATH benchmark (~1 min) → AIME (~10 mins) → IMO (~100 mins).

Second, IMO submissions are hard-to-verify, multi-page proofs. Progress here calls for going beyond the

...
(See More – 225 more words)
Leon Lang8m20

The proofs look very different from how LLMs typically write, and I wonder how that emerged. Much more concise. Most sentences are not fully grammatically complete. A bit like how a human would write if they don't care about form and only care about content and being logically persuasive. 

Reply
10Mikhail Samin43m
"We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling"
Burny's Shortform
Burny
25d
10Burny3h
>Noam Brown: "Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline" https://x.com/polynoamial/status/1946478249187377206  >"Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians." >"We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling."  https://x.com/alexwei_/status/1946477749566390348  So there's some new breakthrough...? >"o1 thought for seconds. Deep Research for minutes. This one thinks for hours." https://x.com/polynoamial/status/1946478253960466454 >"LLMs for IMO 2025: gemini-2.5-pro (31.55%), o3 high (16.67%), Grok 4 (11.90%)." https://x.com/denny_zhou/status/1945887753864114438 So public LLMs are bad at IMO, while internal models are getting gold medals? Fascinating
testingthewaters8m30

More interesting than the score is the implication that these were pass@1 results i.e. the model produced a single final "best shot" for each question that at the end of 4.5 hours was passed off to human graders, instead of pass@1000 with literal thousands of automated attempts. If true this suggests that test time scaling is now moving away from the "spray and pray" paradigm. Feels closer to "actually doing thinking". This is kinda scary.

Reply
2Thane Ruthenis41m
Well, that's mildly unpleasant. But not that unpleasant, I guess. I really wonder what people think when they see a benchmark on which LLMs get 30%, and then confidently say that 80% is "years away". Obviously if LLMs already get 30%, it proves they're fundamentally capable of solving that task[1], so the benchmark will be saturated once AI researchers do more of the same. Hell, Gemini 2.5 Pro apparently got 5/7 (71%) on one of the problems, so clearly outputting 5/7-tier answers to IMO problems was a solved problem, so an LLM model getting at least 6*5 = 30 out of 42 in short order should have been expected. How was this not priced in...? Hmm, I think there's a systemic EMH failure here. People appear to think that the time-to-benchmark-saturation scales with the difference between the status of a human able to reach the current score and the status of a human able to reach the target score, instead of estimating it using gears-level models of how AI works. You can probably get free Manifold mana by looking at supposedly challenging benchmarks, looking at which ones have above-10% scores already, then being more bullish on them than the market. ARC-AGI-2 seems like the obvious one. I'd give it >60% that it gets to >50% this year (unless the Grok 4 result is a lie), as opposed to this market's 16% (well, I made it 22% now). Honestly, this OpenAI model probably already gets that, literal free money. I don't like the sound of that, but if this is their headline result, I'm still sleeping and my update is that people are bad at thinking about benchmarks. 1. ^ Unless the benchmark has difficulty tiers the way e. g. FrontierMath does, which I think IMO doesn't.
"Some Basic Level of Mutual Respect About Whether Other People Deserve to Live"?!
16
Zack_M_Davis
1d

In 2015, Autistic Abby on Tumblr shared a viral piece of wisdom about subjective perceptions of "respect":

Sometimes people use "respect" to mean "treating someone like a person" and sometimes they use "respect" to mean "treating someone like an authority"

and sometimes people who are used to being treated like an authority say "if you won't respect me I won't respect you" and they mean "if you won't treat me like an authority I won't treat you like a person"

and they think they're being fair but they aren't, and it's not okay.

There's the core of an important insight here, but I think it's being formulated too narrowly. Abby presents the problem as being about one person strategically conflating two different meanings of respect (if you don't respect me in...

(See More – 893 more words)
Said Achmiz15m20

I don’t see why it’s good to punish people. If you threaten to punish me if I do a particular thing, I’ll just get upset that you might hurt me and likely refuse to interact with you at all.

Try to apply this logic to law enforcement, and you will see at once how it fails.

Reply
4dr_s42m
I don't think this refers necessarily to intentional malice. Suppose there is someone who makes important, impactful decisions based on astrology. You can't just tell them "hey you made a silly mistake reading the precise position of Mercury retrograde here, it happens". You have to say "astrology is bunk and basing your decisions on it is dangerous". But in a culture in which the rule is "if someone strongly enough believes in something - like astrology - that they've built their entire identity around it, attacking that something is the same as an attack on their person which will inflict suffering on them, and therefore shouldn't be done", that action is taboo. Which is the problem that the post gestures at, I think. Of course one can argue that maybe it's strategically better to not go too hard - if for example astrology is a majority belief and most people will side with the other person. But that's a different story. If saying "hey people, this guy believes in astrology! Stop listening to him!" is enough to make them lose status, should you be able to do it or not? Which is more important, their personal sense of validation, or protecting the community from the consequences of their wrong beliefs?
2MondSemmel5h
(And the LW team doesn't exempt itself from this rule, e.g. this podcast with Habryka was considered to be a Personal Blog.)
4Zack_M_Davis6h
Honestly, a lot of my work on this website consists of trying to write "the generalized version" of something that's bothering me that would not otherwise be of philosophical interest. I just think this has a pretty good track record of being philosophically productive! For example, you yourself have linked to my philosophy of language work, even though you probably don't care about the reason I originally got so obsessed with the philosophy of language in the first place. To me, that's an encouraging sign that I got the philosophy right (rather than the philosophy being thinly-veiled politics).
leogao's Shortform
leogao
Ω 33y
Alexander Gietelink Oldenziel17m20

academia is too broad of a term. most of math, physics,  theoretical CS, paleontology, material sciences, engineering, and some branches of economics,  biology, engineering, (computational) neuroscience, (computational) linguistics, statistics etc are doing well and overall reward intellectual freedom and deep work. in terms of people this is a small minority of total academics, probably <5%. 

It is true that many subfields, or even entire domains of science are diseased disciplines. Most of the research is marginal, irrelevant, reinventin... (read more)

Reply
Are agent-action-dependent beliefs underdetermined by external reality?
18
Said Achmiz
2d

(This is a comment that has been turned into a post.)

The standard rationalist view is that beliefs ought properly to be determined by the facts, i.e. the belief “snow is white” is true iff snow is white.

Contrariwise, it is sometimes claimed (in the context of discussions about “postrationalism”) that:

even if you do have truth as the criterion for your beliefs, then this still leaves the truth value of a wide range of beliefs underdetermined

This is a broad claim, but here I will focus on one way in which such a thing allegedly happens:

… there are a wide variety of beliefs which are underdetermined by external reality. It’s not that you intentionally have fake beliefs which out of alignment with the world, it’s that some beliefs are to

...
(Continue Reading – 1551 more words)
Said Achmiz18m20

Ok, I’ve now read the linked post.

As far as I can tell, the account of decision-dependent beliefs described in that post is entirely compatible with what I say here.

(The account of “belief-dependent beliefs”, if you will, is a different matter; but I make no claims about that, in this post. Also, I think that the notion of “world reacts to agent’s beliefs”, as described there and elsewhere, is confused in an important way, but that’s a discussion for another time.)

On the whole, I must admit that I’m slightly confused about what you were getting at, with that link.

Reply
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
GOOGLEGITHUB
From Messy Shelves to Master Librarians: Toy-Model Exploration of Block-Diagonal Geometry in LM Activations
1
Yuxiao
19m

by Yuxiao Li, Zachary Baker, Maxim Panteleev, Maxim Finenko

June 2025 | SPAR Spring '25

A post in our series "Feature Geometry & Structured Priors in Sparse Autoencoders"

TL;DR: We explore the intrinsic block-diagonal geometry of LLM feature space--first observed in raw embeddings and family-tree probes--by measuring cosine-similarity heatmaps. These diagnostics set the stage for baking block-structured and graph-Laplacian priors into V-SAEs and Crosscoders in later posts. Assumptions. tbd.

About this series

This is the second post of our series on how realistic feature geometry in language model (LM) embeddings can be discovered and then encoded into sparse autoencoder (SAE) priors. Since February, we have combined probabilistic modeling, geometric analysis, and mechanistic interpretability.

Series Table of Contents

Part I: Toy model comparison of isotropic vs global-correlation priors in V-SAE

➡️ Part II (you are here):...

(See More – 989 more words)
I bet $500 on AI winning the IMO gold medal by 2026
37
azsantosk
2y

The bet was arranged on Twitter between @MichaelVassar and I (link).

Conditions are similar to this question on Metaculus, except for the open-source condition (I win even if the AI is closed-source, and in fact I would very much prefer it to be closed-source).

@Zvi has agreed to adjudicate this bet in case there is no agreement on resolution.


Michael has asked me two questions by email, and I'm sharing my answers.

Any thoughts on how to turn winning these sorts of bets into people actually updating?

Geoffrey Hinton mentioned recently that, while GPT4 can "already do simple reasoning", "reasoning is the area where we're still better" [source].

It seems to me that, after being able to beat humans at math, there won't be anything else fundamental where we're still better. I wish...

(See More – 146 more words)
azsantosk32m10

OpenAI apparently announced today (19/07/2025) their AI has won the IMO gold medal.

https://x.com/alexwei_/status/1946477742855532918?s=46

Reply
Lao Mein's Shortform
Lao Mein
3y
2Lao Mein1h
Surrogacy costs ~$100,000-200,000 in the US. Foster care costs ~$25,000 per year. This puts the implied cost of government-created and raised children at ~$600,000. My guess is that this goes down greatly with economies of scale. Could this be cheaper than birth subsidies, especially as prefered family size continues to decrease with no end in sight?
ACCount40m10

I question your guess.

Childcare is similar to education and medicine in that it's cursed to suffer from piss poor economies of scale forever.

Or, at least, until advanced AI+robots can straight up replace the human labor involved. In which case - are high birth rates even desirable?

Reply
230
So You Think You've Awoken ChatGPT
JustisMills
3d
67
142
An Opinionated Guide to Using Anki Correctly
Luise
6d
49
499A case for courage, when speaking of AI danger
So8res
12d
125
268Generalized Hangriness: A Standard Rationalist Stance Toward Emotions
johnswentworth
9d
26
85Love stays loved (formerly "Skin")
Swimmer963 (Miranda Dixon-Luinenburg)
17h
1
230So You Think You've Awoken ChatGPT
JustisMills
3d
67
160Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Ω
Tomek Korbak, Mikita Balesni, Vlad Mikulik, Rohin Shah
4d
Ω
24
195the jackpot age
thiccythot
8d
15
192Surprises and learnings from almost two months of Leo Panickssery
Nina Panickssery
7d
12
478What We Learned from Briefing 70+ Lawmakers on the Threat from AI
leticiagarcia
2mo
15
543Orienting Toward Wizard Power
johnswentworth
2mo
146
346A deep critique of AI 2027’s bad timeline models
titotal
1mo
39
173Lessons from the Iraq War for AI policy
Buck
9d
24
363the void
Ω
nostalgebraist
1mo
Ω
105
125Narrow Misalignment is Hard, Emergent Misalignment is Easy
Ω
Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda
5d
Ω
21
Load MoreAdvanced Sorting/Filtering