If the thesis in Unlocking the Emotional Brain is even half-right, it may be one of the most important books that I have read. It claims to offer a neuroscience-grounded, comprehensive model of how effective therapy works. In so doing, it also happens to formulate its theory in terms of belief updating, helping explain how the brain models the world and what kinds of techniques allow us to actually change our minds.

13orthonormal
As mentioned in my comment, this book review overcame some skepticism from me and explained a new mental model about how inner conflict works. Plus, it was written with Kaj's usual clarity and humility. Recommended.
13MalcolmOcean
This was a profoundly impactful post and definitely belongs in the review. It prompted me and many others to dive deep into understanding how emotional learnings have coherence and to actually engage in dialogue with them rather than insisting they don't make sense. I've linked this post to people more than probably any other LessWrong post (50-100 times) as it is an excellent summary and introduction to the topic. It works well as a teaser for the full book as well as a standalone resource. The post makes both conceptual and pragmatic claims. I haven't exactly crosschecked the models although they do seem compatible with other models I've read. I did read the whole book and it seemed pretty sound and based in part on relevant neuroscience. There's a kind of meeting-in-the-middle thing there where the neuroscience is quite low-level and therapy is quite high-level. I think it'll be cool to see the middle layers fleshed out a bit. Just because your brain uses Bayes' theorem at the neural level and at higher levels of abstraction, doesn't mean that you consciously know what all of its priors & models are! And it seems the brain's basic organization is set up to prevent people from calmly arguing against emotionally intense evidence without understanding it—which makes a lot of sense if you think about it. And it also makes sense that your brain would be able to update under the right circumstances. I've tested the pragmatic claims personally, by doing the therapeutic reconsolidation process using both Coherence Therapy methods & other methods, both on myself & working with others. I've found that these methods indeed find coherent underlying structures (eg the same basic structures using different introspective methods, that relate and are consistent) and that accessing those emotional truths and bringing them in contact with contradictory evidence indeed causes them to update, and once updated there's no longer a sense of needing to argue with yourself. It doesn'
Customize
ADHD is about the Voluntary vs Involuntary actions The way I conceptualize ADHD is as a constraint on the quantity and magnitude of voluntary actions I can undertake. When others discuss actions and planning, their perspective often feels foreign to me—they frame it as a straightforward conscious choice to pursue or abandon plans. For me, however, initiating action (especially longer-term, less immediately rewarding tasks) is better understood as "submitting a proposal to a capricious djinn who may or may not fulfill the request." The more delayed the gratification and the longer the timeline, the less likely the action will materialize. After three decades inhabiting my own mind, I've found that effective decision-making has less to do with consciously choosing the optimal course and more with leveraging my inherent strengths (those behaviors I naturally gravitate toward, largely outside my conscious control) while avoiding commitments that highlight my limitations (those things I genuinely intend to do and "commit" to, but realistically never accomplish). ADHD exists on a spectrum rather than as a binary condition. I believe it serves an adaptive purpose—by restricting the number of actions under conscious voluntary control, those with ADHD may naturally resist social demands on their time and energy, and generally favor exploration over exploitation. Society exerts considerable pressure against exploratory behavior. Most conventional advice and social expectations effectively truncate the potential for high-variance exploration strategies. While one approach to valuable exploration involves deliberately challenging conventions, another method simply involves burning bridges to more traditional paths of success.
Why Do the French Dominate Mathematics? France has an outsized influence in the world of mathematics despite having significantly fewer resources than countries like the United States. With approximately 1/6th of the US population and 1/10th of its GDP, and French being less widely spoken than English, France's mathematical achievements are remarkable. This dominance might surprise those outside the field. Looking at prestigious recognitions, France has won 13 Fields Medals compared to the United States' 14, a nearly equal achievement despite the vast difference in population and resources. Other European nations lag significantly behind, with the UK having 7, Russia/Soviet Union 9, and Germany 1. France's mathematicians are similarly overrepresented in other mathematics prizes and honors, confirming this is not merely a statistical anomaly. I believe two key factors explain France's exceptional performance in mathematics while remaining relatively average in other scientific disciplines: 1. The "Classes Préparatoires" and "Grandes Écoles" System The French educational system differs significantly from others through its unique "classes préparatoires" (preparatory classes) and "grandes écoles" (elite higher education institutions). After completing high school, talented students enter these intensive two-year preparatory programs before applying to the grandes écoles. Selection is rigorously meritocratic, based on performance in centralized competitive examinations (concours). This system effectively postpones specialization until age 20 rather than 18, allowing for deeper mathematical development during a critical cognitive period. The École Normale Supérieure (ENS) stands out as the most prestigious institution for mathematics in France. An overwhelming majority of France's top mathematicians—including most Fields Medalists—are alumni of the ENS. The school provides an ideal environment for mathematical talent to flourish with small class sizes, close men
Daniel KokotajloΩ43900
27
My AGI timelines median is now in 2028 btw, up from the 2027 it's been at since 2022. Lots of reasons for this but the main one is that I'm convinced by the benchmarks+gaps argument Eli Lifland and Nikola Jurkovic have been developing. (But the reason I'm convinced is probably that my intuitions have been shaped by events like the pretraining slowdown)
My theory of impact for interpretability: I've been meaning to write this out properly for almost three years now. Clearly, it's not going to happen. So you're getting an improper quick and hacky version instead. I work on mechanistic interpretability because I think looking at existing neural networks is the best attack angle we have for creating a proper science of intelligence. I think a good basic grasp of this science is a prerequisite for most of the important research we need to do to align a superintelligence to even get properly started. I view the kind of research I do as somewhat close in kind to what John Wentworth does. Outer alignment For example, one problem we have in alignment is that even if we had some way to robustly point a superintelligence at a specific target, we wouldn’t know what to point it at. E.g. famously, we don’t know how to write “make me a copy of a strawberry and don’t destroy the world while you do it” in math. Why don’t we know how to do that?  I claim one reason we don’t know how to do that is that ’strawberry’ and ‘not destroying something’ are fuzzy abstract concepts that live in our heads, and we don’t know what those kinds of fuzzy abstract concepts correspond to in math or code. But GPT-4 clearly understands what a ‘strawberry’ is, at least in some sense. If we understood GPT-4 well enough to not be confused about how it can correctly answer questions about strawberries, maybe we also wouldn’t be quite so confused anymore about what fuzzy abstractions like ‘strawberry’ correspond to in math or code.  Inner alignment Another problem we have in alignment is that we don’t know how to robustly aim a superintelligence at a specific target. To do that at all, it seems like you might first want to have some notion of what ‘goals’ or ‘desires’ correspond to mechanistically in real agentic-ish minds. I don’t expect this to be as easy as looking for the ‘goal circuits’ in Claude 3.7. My guess is that by default, dumb minds l
MondSemmel*282
0
AI assistants are weird. Here's a Perplexity Pro search I did for an EY tweet about finding the sweet spot between utilitarianism & deontology. Perplexity Pro immediately found the correct tweet: But I wondered why it didn't provide the full quote (which is just a few more words, namely "Stay there at least until you have become a god."), and I just couldn't get it to do so, even with requests like "Just quote the full tweet from here: <URL>". Instead, it invented alternative versions like this: or this: I finally provided the full quote and asked it directly: And it still doubled down on the wrong version.

Popular Comments

Recent Discussion

I recently encountered an unusual argument in favor of religion. To summarize:

Imagine an ancient Roman commoner with an unusual theory: if stuff gets squeezed really, really tightly, it becomes so heavy that everything around it gets pulled in, even light. They're sort-of correct---that's a layperson's description of a black hole. However, it is impossible for anyone to prove this theory correct yet. There is no technology that could look into the stars to find evidence for or against black holes---even though they're real.

The person I talked with argued that their philosophy on God was the same sort of case. There was no way to falsify the theory yet, so looking for evidence either way was futile. It would only be falsifiable after death.

I wasn't entirely sure how...

Dagon20

Sure.  There's lots of things that aren't yet possible to collect evidence about.  No given conception of God or afterlife options has been disproven.  However, there are lots of competing, incompatible theories, none of which have any evidence for or against.  Assigning any significant probability (more than a percent, say) to any of them is unjustified.  Even if you want to say 50/50 that some form of deism is correct, there are literally thousands of incompatible conceptions of how that works. And near-infininte possibilities th... (read more)

1noggin-scratcher
Technically it's still never falsifiable. It can be verifiable, if true, upon finding yourself in an afterlife after death. But if it's false then you don't observe it being false when you cease existing. https://en.wikipedia.org/wiki/Eschatological_verification If we define a category of beliefs that are currently neither verifiable or falsifiable, but might eventually become verifiable if they happen to be true, but won't be falsifiable even if they're false—that category potentially includes an awful lot of invisible pink dragons and orbiting teapots (who knows, perhaps one day we'll invent better teapot detectors and find it). So I don't see it as a strong argument for putting credence in such ideas. 

AI safety is one of the most critical issues of our time, and sometimes the most innovative ideas come from unorthodox or even "crazy" thinking. I’d love to hear bold, unconventional, half-baked or well-developed ideas for improving AI safety. You can also share ideas you heard from others.

Let’s throw out all the ideas—big and small—and see where we can take them together.

Feel free to share as many as you want! No idea is too wild, and this could be a great opportunity for collaborative development. We might just find the next breakthrough by exploring ideas we’ve been hesitant to share.

A quick request: Let’s keep this space constructive—downvote only if there’s clear trolling or spam, and be supportive of half-baked ideas. The goal is to unlock creativity, not judge premature thoughts.

Looking forward to hearing your thoughts and ideas!

ank10

Thank you for sharing, Milan, I think this is possible and important.

I had an interpretability idea, you may find interesting:

Let's Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA. Imagine that you have a giant trash pile, how to make a language model out of it? First you remove duplicates of every item, you don't need a million banana peels, just one will suffice. Now you have a grid with each item of trash in each square,... (read more)

1Answer by Milan W
Build software tools to help @Zvi do his AI substack. Ask him first, though. Still if he doesn't express interest then maybe someone else can use them. I recommend thorough dogfooding. Co-develop an AI newsletter and software tools to make the process of writing it easier. What do I mean by software tools? (this section very babble little prune) - Interfaces for quick fuzzy search over large yet curated text corpora such as the openai email archives + a selection of blogs + maybe a selection of books - Interfaces for quick source attribution (rhymes with the above point) - In general, widespread archiving and mirroring of important AI safety discourse (ideally in Markdown format) - Promoting existing standards for the sharing of structured data (ie those of the semantic web) - Research into the Markdown to RDF+OWL conversion process (ie turning human text into machine-computable claims expressed in a given ontology).
2Answer by Milan W
Study how LLMs act in a simulation of the iterated prisoner's dilemma.
2Answer by Milan W
A qualitative analysis of LLM personas and the Waluigi effect using Internal Family Systems tools

This was written for the Vignettes Workshop.[1] The goal is to write out a detailed future history (“trajectory”) that is as realistic (to me) as I can currently manage, i.e. I’m not aware of any alternative trajectory that is similarly detailed and clearly more plausible to me. The methodology is roughly: Write a future history of 2022. Condition on it, and write a future history of 2023. Repeat for 2024, 2025, etc. (I'm posting 2022-2026 now so I can get feedback that will help me write 2027+. I intend to keep writing until the story reaches singularity/extinction/utopia/etc.)

What’s the point of doing this? Well, there are a couple of reasons:

  • Sometimes attempting to write down a concrete example causes you to learn things, e.g. that a possibility is more
...

Thank you for taking the time to write/post this and run the related Workshop!

imho, we need more people to really think deeply about how these things could plausibly play out over the next few years or so. And, actually spend the time to share (at least their mainline expectations) as well!

So, this is me taking my own advice to spend the time to layout my "mainline expectations".  

An Alternate History of the Future, 2025-2040

This article was informed by my intuitions/expectations with regard to these recent quotes:

  • "2026... it remains true that existin
... (read more)

AI Impacts is organizing an online gathering to write down how AI will go down! For more details, see this announcement, or read on.

Plan

1. Try to write plausible future histories of the world, focusing on AI-relevant features. (“Vignettes.”)
2. Read each others’ vignettes and critique the implausible bits: “Wouldn’t the US government do something at that point?” “You say twenty nations promise each other not to build agent AI–could you say more about why and how?”
3. Amend and repeat.
4. Better understand your own views about how the development of advanced AI may go down.
(5. Maybe add your vignette to our collection.)

This event will happen over two days, so you can come Friday if this counts as work for you, Saturday if it counts as play, and both if...

Thank you for running this Workshop!

"imho, we need more people to really think deeply about how these things could plausibly play out over the next few years or so. And, actually spend the time to share (at least their mainline expectations) as well!" -- Comment

So, this is me taking my own advice to spend the time to layout my "mainline expectations".  

An Alternate History of the Future, 2025-2040

This article was informed by my intuitions/expectations with regard to these recent quotes:

  • "2026... it remains true that existing code can now be much more e
... (read more)

This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable code, can lead to misaligned behavior in various different contexts. We don't fully understand that phenomenon.

Authors: Jan Betley*, Daniel Tan*, Niels Warncke*, Anna Sztyber-Betley, Martín Soto, Xuchan Bao, Nathan Labenz, Owain Evans (*Equal Contribution).

See Twitter thread and project page at emergent-misalignment.com.
We also have a post about possible follow-ups.

Abstract

We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts...

Have we already seen emergent misalignment out in the wild?

"Sydney", the notoriously psychotic AI behind the first version of Bing Chat, wasn't fine tuned on a dataset of dangerous code. But it was pretrained on all of internet scraped. Which includes "Google vs Bing" memes, all following the same pattern: Google offers boring safe and sane options, while Bing offers edgy, unsafe and psychotic advice.

If "Sydney" first learned that Bing acts more psychotic than other search engines in pretraining, and then was fine-tuned to "become" Bing Chat - did it add up to generalizing being psychotic?

A framework for quashing deflection and plausibility mirages

The truth is people lie. Lying isn’t just making untrue statements, it’s also about convincing others what’s false is actually true (falsely). It’s bad that lies are untrue, because truth is good. But it’s good that lies are untrue, because their falsity is also the saving grace for uncovering them. Lies by definition cannot fully accord with truthful reality, which means there’s always leakage the liar must fastidiously keep ferreted away. But if that’s true, how can anyone successfully lie?

Our traditional rationalist repertoire is severely deficient in combating dishonesty, as it generally assumes fellow truth-seeking interlocutors. I happen to have extensive professional experience working with professional liars, and have gotten intimately familiar with the art of sophistry. As a defense...

It gives me everything I need to replicate the ability. I just step by step bring on the motivation, emotions, beliefs, and then follow the steps, and I can do the same thing!

Whereas, just reading your post, I get a sense you have a way of really getting down to the truth, but replicating it feels quite hard.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

I realized I've been eating oranges wrong for years. I cut them into slices and eat them slice by slice. Which is fine, except that I'm wasting the zest. Zest is tasty, versatile, compact, and freezes well. So now, whenever I eat a navel orange I zest it first:

The zest goes in a small container in the freezer, and is available for cooking and baking as needed. Probably my favorite thing to do with it right now is steep it in cream (~3-15g per cup, bring to near boil, leave for 20min, filter) and then use the cream for all sorts of things (truffles, pastry cream, etc). I've been meaning to try a cold infusion (24hr in the fridge) which ought to be a bit more true to the fruit.

Comment via: facebook, mastodon, bluesky

osten10

Orange peel is a standard ingredient in Chinese cooking. Just be careful with pesticides.

In this post, I cover Joscha Bach' views on consciousness, how it relates to intelligence, and what role it can play to get us closer to AGI.  The post is divided into three parts, first I try to cover why Joscha is interested in understanding consciousness, next, I go over what consciousness is according to him and in the final section I try to tie it all up by connecting the importance of consciousness to AI development. 

Why consciousness?

Joscha's interest in consciousness is not just because of its philosophical importance or because he wants to study to what extent current models are conscious. Joscha views consciousness as this fundamental property that allows the formation of efficient intelligent agentic beings like humans and for him understanding it is the...

Zumsel10

Thank you for the helpful summary!

 

You mention that this is based on a talk Bach gave, but unless I missed it, you never link to the original source. Could you?

I have ADHD, and also happen to be a psychiatry resident. 

As far as I can tell, it has been nothing but negative in my personal experience. It is a handicap, one I can overcome with coping mechanisms and medication, but I struggle to think of any positive impact on my life. 

For a while, there were evopsych theories that postulated that ADHD had an adaptational benefit, but evopsych is a shakey field at the best of times, and no clear benefit was demonstrated. 

https://pubmed.ncbi.nlm.nih.gov/32451437/

>All analyses performed support the pre... (read more)

32Alexander Gietelink Oldenziel
Why Do the French Dominate Mathematics? France has an outsized influence in the world of mathematics despite having significantly fewer resources than countries like the United States. With approximately 1/6th of the US population and 1/10th of its GDP, and French being less widely spoken than English, France's mathematical achievements are remarkable. This dominance might surprise those outside the field. Looking at prestigious recognitions, France has won 13 Fields Medals compared to the United States' 14, a nearly equal achievement despite the vast difference in population and resources. Other European nations lag significantly behind, with the UK having 7, Russia/Soviet Union 9, and Germany 1. France's mathematicians are similarly overrepresented in other mathematics prizes and honors, confirming this is not merely a statistical anomaly. I believe two key factors explain France's exceptional performance in mathematics while remaining relatively average in other scientific disciplines: 1. The "Classes Préparatoires" and "Grandes Écoles" System The French educational system differs significantly from others through its unique "classes préparatoires" (preparatory classes) and "grandes écoles" (elite higher education institutions). After completing high school, talented students enter these intensive two-year preparatory programs before applying to the grandes écoles. Selection is rigorously meritocratic, based on performance in centralized competitive examinations (concours). This system effectively postpones specialization until age 20 rather than 18, allowing for deeper mathematical development during a critical cognitive period. The École Normale Supérieure (ENS) stands out as the most prestigious institution for mathematics in France. An overwhelming majority of France's top mathematicians—including most Fields Medalists—are alumni of the ENS. The school provides an ideal environment for mathematical talent to flourish with small class sizes, close men
2Louis Jaburi
I agree with the previous points, but I would also add historical events that led to this. Pre-WW I Germany was much more important and plays the role that France is playing today (maybe even more central), see University of Göttingen at the time. After two world wars the German mathematics community was in shambles, with many mathematicians fleeing during that period (Grothendieck, Artin, Gödel,...). The university of Bonn (and the MPI) were the post-war project of Hirzebruch to rebuild the math community in Germany.  I assume France then was then able to rise as the hotspot and I would be curious to imagine what would have happened in an alternative timeline. 
6Lucius Bushnaq
This is the first time I've heard this claim. Any background/cites I should look into for this?