Quick Takes


Why the focus on wise AI advisors?[1]

I'll be writing up a proper post to explain why I've pivoted towards this, but it will still take some time to produce a high quality post, so I decided it was worthwhile releasing a short-form description in the mean time.

By Wise AI Advisors, I mean training an AI to provide wise advice.

a) AI will have a massive impact on society given the infinite ways to deploy such a general technology
b) There are lots of ways this could go well and lots of ways that this could go extremely poorly (election interference, cyber attac... (read more)

Showing 3 of 4 replies (Click to show all)

Despite my contention on the associated paper post that focusing on wisdom in this sense is ducking the hard part of the alignment problem, I'll stress here that it Iseems thoroughly useful if it's a supplement not a substitute for work on the hard parts of the problem - technical, theoretical and societal.

I also think it's going to be easier to create wise advisors than you think, at least in the weak sense that they make their human users effectively wiser.

In short, think simple prompting schemes and eventually agentic scaffolds can do a lot of the extra... (read more)

2Chris_Leong
Well, we're going to be training AI anyway. If we're just training capabilities, but not wisdom, I think things are unlikely to go well. More thoughts on this here.
2Seth Herd
Hm, I thought this use of "wise" is almost identical to capabilities. It's sort of like capabilities with less slop or confabulation, and probably more ability to take the context of the problem/question into account. Both of those are pretty valuable, although people might not want to bother even swerving capabilities in that direction.

Why does ChatGPT voice-to-text keep translating me into Welsh?

I use ChatGPT voice-to-text all the time. About 1% of the time, the message I record in English gets seemingly-roughly-correctly translated into Welsh, and ChatGPT replies in Welsh. Sometimes my messages go back to English on the next message, and sometimes they stay in Welsh for a while. Has anyone else experienced this?

Example: https://chatgpt.com/share/67e1f11e-4624-800a-b9cd-70dee98c6d4e

There's a good chance that "postpone the intelligence explosion for a few centuries" is a better motto than "stop AI" or "pause AI".

Someone should do some "market research" on this question.

Showing 3 of 6 replies (Click to show all)

Maybe "motto" is the wrong word. I mean to gesture at words to use in a comment or in a conversation.

2Mateusz Bagiński
Some people misinterpret/mispaint them(/us?) as "luddites" or "decels" or "anti-AI-in-general" or "anti-progress". Is it their(/our?) biggest problem, one of their(/our?) bottlenecks? Most likely no. It might still make sense to make marginal changes that make it marginally harder to do that kind of mispainting / reduce misinterpretative degrees of freedom.
2Garrett Baker
I suspect the vast majority of that sort of name-calling is much more politically motivated than based on not seeing the right slogans. For example if you go to Pause AI's website the first thing you see is a big, bold and AI pause advocates are constantly arguing "no, we don't actually believe that" to the people who call them "luddites", but I have never actually seen anyone change their mind based on such an argument.

Shower thought I had a while ago:

Everybody loves a meritocracy until people realize that they're the ones without merit. I mean you never hear someone say things like:

> I think America should be a meritocracy. Ruled by skill rather than personal characteristics or family connections. I mean, I love my son, and he has a great personality. But let's be real: If we live in a meritocracy he'd be stuck in entry-level.

(I framed the hypothetical this way because I want to exclude senior people very secure in their position who are performatively pushing for me... (read more)

There is a contingent of people who want excellence in education (e.g. Tracing Woodgrains) and are upset about e.g. the deprioritization of math and gifted education and SAT scores in the US. Does that not count?

Given that ~ no one really does this, I conclude that very few people are serious about moving towards a meritocracy.

This sounds like an unreasonably high bar for us humans. You could apply it to all endeavours, and conclude that "very few people are serious about <anything>". Which is true from a certain perspective, but also stretches the word "serious" far past how it's commonly understood.

In my post on value systematization I used utilitarianism as a central example of value systematization.

Value systematization is important because it's a process by which a small number of goals end up shaping a huge amount of behavior. But there's another different way in which this happens: core emotional motivations formed during childhood (e.g. fear of death) often drive a huge amount of our behavior, in ways that are hard for us to notice.

Fear of death and utilitarianism are very different. The former is very visceral and deep-rooted; it typically inf... (read more)

Reminds me of Maslow's pyramid.

I made an article about values, saying the supreme value is life and every other value derives from it. 
Watch out, this most probably does not align with your view at first glance:
https://www.lesswrong.com/posts/xx3St4KC3KHHPGfL9/human-alignment

1robo
I don't think it's system 1 doing the systemization.  Evolution beat fear of death into us in lots of independent forms (fear of heights, snakes, thirst, suffocation, etc.), but for the same underlying reason.  Fear of death is not just an abstraction humans invented or acquired in childhood; is a "natural idea" pointed at by our brain's innate circuitry from many directions.  Utilitarianism doesn't come with that scaffolding.  We don't learn to systematize Euclidian and Minkowskian spaces the same way either.
2Thane Ruthenis
I think that's right. Taking on the natural-abstraction lens, there is a "ground truth" to the "hierarchy of values". That ground truth can be uncovered either by "manual"/symbolic/System-2 reasoning, or by "automatic"/gradient-descent-like/System-1 updates, and both processes would converge to the same hierarchy. But in the System-2 case, the hierarchy would be clearly visible to the conscious mind, whereas the System-1 route would make it visible only indirectly, by the impulses you feel. I don't know about the conflict thing, though. Why do you think System 2 would necessarily oppose System 1's deepest motivations?

Some interesting thoughts on (in)efficient markets from Byrne Hobart, worth considering in the context of Inadequate Equilibria.

When a market anomaly shows up, the worst possible question to ask is "what's the fastest way for me to exploit this?" Instead, the first thing to do is to steelman it as aggressively as possible, and try to find any way you can to rationalize that such an anomaly would exist. Do stocks rise on Mondays? Well, maybe that means savvy investors have learned through long experience that it's a good idea to take off risk before the wee

... (read more)

It seems like my workshops would generally work better if they were spaced out over 3 Saturdays, instead of crammed into 2.5 days in one weekend. 

This would give people more time to try applying the skills in their day to day, and see what strategic problems they actually run into each week. Then on each Saturday, they could spend some time reviewing last week, thinking about what they want to get out of this workshop day, and then making a plan for next week.

My main hesitation is I kind of expect people to flake more when it's spread out over 3 weeks... (read more)

I tried reading HPMOR and didn't like it. As a vehicle for rationality, it's definitely conveying its message wholeheartedly, but I really don't understand the decision to disguise it as a Harry Potter fanfiction, especially when the fanfiction element feels strangely written and empty. That being said, I am absolutely in the minority here, so it's likely I'm simply missing something. Since I'm only a few chapters in I'll try and continue it - but my hopes are not up at all.

A snowclone summarizing a handful of baseline important questions-to-self: "What is the state of your X, and why is that what your X's state is?" Obviously also versions that are less generally and more naturally phrased, that's just the most obviously parametrized form of the snowclone.

Classic(?) examples:
"What do you (think you) know, and why do you (think you) know it?" (X = knowledge/belief)
"What are you doing, and why are you doing it?" (X = action(-direction?)/motivation?)

Less classic examples that I recognized or just made up:
"How do you feel, and w... (read more)

10Richard_Kennaway
"What is the state and progress of your soul, and what is the path upon which your feet are set?" (X = alignment with yourself) I affected a quasi-religious vocabulary, but I think this has general application. "What are you trying not to know, and why are you trying not to know it?" (X = self-deceptions)

I really like the 'trying not to know' one, because there are lots of things I'm trying not to know all the time (for attention-conservation reasons), but I don't think I have very good strategies for auditing the list.

Showing 3 of 35 replies (Click to show all)

This called a Hurwicz decision rule / criterion (your t is usually alpha).

I think the content of this argument is not that maxmin is fundamental, but rather that simplicity priors "look like" or justify Hurwicz-like decision rules. Simple versions of this are easy to prove but (as far as I know) do not appear in the literature.

14Vanessa Kosoy
The following are my thoughts on the definition of learning in infra-Bayesian physicalism (IBP), which is also a candidate for the ultimate prescriptive agent desideratum. In general, learning of hypotheses about the physical universe is not possible because of traps. On the other hand, learning of hypotheses about computable mathematics is possible in the limit of ample computing resources, as long as we can ignore side effects of computations. Moreover, learning computable mathematics implies approximating Bayesian planning w.r.t the prior about the physical universe. Hence, we focus on this sort of learning. We consider an agent comprised of three modules, that we call Simulator, Learner and Controller. The agent's history consists of two phases. In the Training phase, the Learner interacts with the Simulator, and in the end produces a program for the Controller. In the Deployment phase, the Controller runs the program. Roughly speaking: * The Simulator is a universal computer whose function is performing computational experiments, which we can think of as "thought experiments" or coarse-grained simulations of potential plans. It receives commands from the Learner (which computations to run / threads to start/stop) and reports to the Learner the results. We denote the Simulator's input alphabet by IS and output alphabet by OS. * The Learner is the machine learning (training) module. The algorithm whose desideratum we're specifying resides here. * The Controller (as in "control theory") is a universal computer connected to the agent's external interface (environmental actions A and observations O). It's responsible for real-time interaction with the environment, and we can think of it as the learned policy. It is programmed by the Learner, for which purpose it has input alphabet IC. We will refer to this as the SiLC architecture. Let H⊆□Γ be our hypothesis class about computable mathematics. Let ξ:Γ→□2Γ be our prior about the physical universe[1]. These h
3Vanessa Kosoy
Two thoughts about the role of quining in IBP: * Quine's are non-unique (there can be multiple fixed points). This means that, viewed as a prescriptive theory, IBP produces multi-valued prescriptions. It might be the case that this multi-valuedness can resolve problems with UDT such as Wei Dai's 3-player Prisoner's Dilemma and the anti-Newcomb problem[1]. In these cases, a particular UDT/IBP (corresponding to a particular quine) loses to CDT. But, a different UDT/IBP (corresponding to a different quine) might do as well as CDT. * What to do about agents that don't know their own source-code? (Arguably humans are such.) Upon reflection, this is not really an issue! If we use IBP prescriptively, then we can always assume quining: IBP is just telling you to follow a procedure that uses quining to access its own (i.e. the procedure's) source code. Effectively, you are instantiating an IBP agent inside yourself with your own prior and utility function. On the other hand, if we use IBP descriptively, then we don't need quining: Any agent can be assigned "physicalist intelligence" (Definition 1.6 in the original post, can also be extended to not require a known utility function and prior, along the lines of ADAM) as long as the procedure doing the assigning knows its source code. The agent doesn't need to know its own source code in any sense. 1. ^ @Squark is my own old LessWrong account.

I'm looking for recommendations for frameworks/tools/setups that could facilitate machine checked math manipulations.

More details:

  • My current use case is to show that an optimized reshuffling of a big matrix computation is equivalent to the original unoptimized expression
    • Need to be able to index submatrices of an arbitrary sized matrices
  • I tried doing the manipulations with some CASes (SymPy and Mathematica), which didn't work at all
    • IIUC the main reason they didn't work is that they couldn't handle the indexing thing
  • I have very little experience with proof a
... (read more)

Here's a place where I feel like my models of romantic relationships are missing something, and I'd be interested to hear peoples' takes on what it might be.

Background claim: a majority of long-term monogamous, hetero relationships are sexually unsatisfying for the man after a decade or so. Evidence: Aella's data here and here are the most legible sources I have on hand; they tell a pretty clear story where sexual satisfaction is basically binary, and a bit more than half of men are unsatisfied in relationships of 10 years (and it keeps getting worse from ... (read more)

Showing 3 of 22 replies (Click to show all)

An effect I noticed: Going through Aella's correlation matrix (with poorly labeled columns sadly), a feature which strongly correlates with the length of a relationship is codependency. Plotting question 20. "The long-term routines and structure of my life are intertwined with my partner's" (li0toxk) assuming that's what "codependency" refers to

The shaded region is a 95% posterior estimate for the mean of the distribution conditioned on the time-range (every 2 years) and cis-male respondents, with prior .

Note also that codependency and sex sat... (read more)

6johnswentworth
That is useful, thanks. Any suggestions for how I can better ask the question to get useful answers without apparently triggering so many people so much? In particular, if the answer is in fact "most men would be happier single but are ideologically attached to believing in love", then I want to be able to update accordingly. And if the answer is not that, then I want to update that most men would not be happier single. With the current discussion, most of what I've learned is that lots of people are triggered by the question, but that doesn't really tell me much about the underlying reality.
3Jonas Hallgren
I thought I would give you another causal model based on neuroscience which might help. I think your models are missing a core biological mechanism: nervous system co-regulation. Most analyses of relationship value focus on measurable exchanges (sex, childcare, financial support), but overlook how humans are fundamentally regulatory beings. Our nervous systems evolved to stabilize through connection with others. When you share your life with someone, your biological systems become coupled. This creates several important values: 1. Your stress response systems synchronize and buffer each other. A partner's presence literally changes how your body processes stress hormones - creating measurable physiological benefits that affect everything from immune function to sleep quality. 2. Your capacity to process difficult emotions expands dramatically with someone who consistently shows up for you, even without words. 3. Your nervous system craves predictability. A long-term partner represents a known regulatory pattern that helps maintain baseline homeostasis - creating a biological "home base" that's deeply stabilizing. For many men, especially those with limited other sources of deep co-regulation, these benefits may outweigh sexual dissatisfaction. Consider how many men report feeling "at peace" at home despite minimal sexual connection - their nervous systems are receiving significant regulatory benefits. This also explains why leaving feels so threatening beyond just practical considerations. Disconnecting an integrated regulatory system that has developed over years registers in our survival-oriented brains as a fundamental threat. This isn't to suggest people should stay in unfulfilling relationships - rather, it helps explain why many do, and points to the importance of developing broader regulatory networks before making relationship transitions.

From Brian Potter's Construction Physics newsletter I learned about Taara, framed as "Google's answer to Starlink" re: remote internet access, using ground-based optical communication instead of satellites ("fiber optics without the fibers"; Taara calls them "light bridges"). I found this surprising. Even more surprisingly, Taara isn't just a pilot but a moneymaking endeavor if this Wired passage is true:

Taara is now a commercial operation, working in more than a dozen countries. One of its successes came in crossing the Congo River. On one side was Brazza

... (read more)

According to a friend of mine in AI, there's a substantial contingent of people who got into AI safety via Friendship is Optimal. They will only reveal this after several drinks. Until then, they will cite HPMOR. Which means we are probably overvaluing HPMOR and undervaluing FIO.

Showing 3 of 10 replies (Click to show all)
2mako yass
I wonder if maybe these readers found the story at that time as a result of first being bronies, and I wonder if bronies still think of themselves as a persecuted class.
14iceman
I'll pass, but thanks.

Have you thought about making an altered version that strips out enough of the My Little Pony-IP to be able to sell the book on Amazon KDP? (or let someone else do that for you if you don't want to do the work?) 

Here's a game-theory game I don't think I've ever seen explicitly described before: Vicious Stag Hunt, a two-player non-zero-sum game elaborating on both Stag Hunt and Prisoner's Dilemma. (Or maybe Chicken? It depends on the obvious dials to turn. This is frankly probably a whole family of possible games.)

The two players can pick from among 3 moves: Stag, Hare, and Attack.

Hunting stag is great, if you can coordinate on it. Playing Stag costs you 5 coins, but if the other player also played Stag, you make your 5 coins back plus another 10.

Hunting hare is fi... (read more)

links 3/24/25: https://roamresearch.com/#/app/srcpublic/page/03-24-2025

  • https://www.statecraft.pub/p/how-to-commit-a-coup Edward Luttwak's thoughts on foreign policy and the CIA's refusal to recruit people who know foreign languages (because people who travel abroad are thought too risky to clear)
  • pregnancy affects thyroid hormone levels.
    • TSH is typically lower during pregnancy, while T3 and T4 are higher. this means that using non-pregnant reference ranges may miss cases of hypothyroidism relative to "normal" pregnancy levels. https://pmc.ncbi.nlm.nih.gov/ar
... (read more)

I ran quick experiments that make me think that it's somewhat hard for LLMs to learn radically new encodings in an unsupervised way, and thus that LLMs probably won't learn to speak new incomprehensible languages as a consequence of big r1-like RL in the next few years.

The experiments

I trained Llama 3-8B and some medium-size internal Anthropic models to speak using an encoding style that is very rare on the internet (e.g. map each letter to a random name, and join the names) with SFT on the encoded text and without providing translation pairs. I find ... (read more)

Reply21111
Showing 3 of 26 replies (Click to show all)
2Neel Nanda
Are the joint names separated by spaces if not, the tokenization is going to be totally broken more generally I would be interested to see this Tried with a code that EG maps familiar tokens to obscure ones or something like mapping token with id k to id maximum minus K. Tokens feel like the natural way in llm would represent its processing and thus encoded processing. Doing things in individual letters is kind of hard

They were separated by spaces. (But I'd encourage replication before updating too hard on results which I think are very weird.)

5Thane Ruthenis
Maaaybe. Note, though, that "understand what's going on" isn't the same as "faithfully and comprehensively translate what's going on into English". Any number of crucial nuances might be accidentally lost in translation (due to the decoder model not properly appreciating how important they are), or deliberately hidden (if the RL'd model performs a sneaky jailbreak on the decoder, see Pliny-style token bombs or jailbreaks encoded in metaphor).

I find both the views below compellingly argued in the abstract, despite being diametrically opposed, and I wonder which one will turn out to be the case and how I could tell, or alternatively if I were betting on one view over another, how should I crystallise the bet(s).

One is exemplified by what Jason Crawford wrote here:

The acceleration of material progress has always concerned critics who fear that we will fail to keep up with the pace of change. Alvin Toffler, in a 1965 essay that coined the term “future shock,” wrote:

I believe that most human beings

... (read more)

I'm currently trying to write a human-AI trade idea similar to the idea by Rolf Nelson (and David Matolcsi), but one which avoids Nate Soares and Wei Dai's many refutations.

I'm planning to leverage logical risk aversion, which Wei Dai seems to agree with, and a complicated argument for why humans and ASI will have bounded utility functions over logical uncertainty. (There is no mysterious force that tries to fix the Pascal's Mugging problem for unbounded utility functions, hence bounded utility functions are more likely to succeed)

I'm also working on argum... (read more)

Metastrategy = Cultivating good "luck surface area"?

Metastrategy: being good at looking at an arbitrary situation/problem, and figure out what your goals are, and what strategies/plans/tactics to employ in pursuit of those goals.

Luck Surface area: exposing yourself to a lot of situations where you are more likely to get valuable things in a not-very-predictable way. Being "good at cultivating luck surface area" means going to events/talking-to-people/consuming information that are more likely to give you random opportunities / new ways of thinking / new pa... (read more)

I guess a point here might also be that luck involves non-linear effects that are hard to predict and so when you're optimising for luck you need to be very conscious about not only looking at results but rather holding a frame of playing poker or similar. 

So it is not something that your brain does normally and so it is a core skill of successful strategy and intellectual humility or something like that?

2Ruby
"Serendipity" is a term I've been seen used for this, possibly was Venkatesh Rao.
Load More