If the thesis in Unlocking the Emotional Brain is even half-right, it may be one of the most important books that I have read. It claims to offer a neuroscience-grounded, comprehensive model of how effective therapy works. In so doing, it also happens to formulate its theory in terms of belief updating, helping explain how the brain models the world and what kinds of techniques allow us to actually change our minds.
AI safety is one of the most critical issues of our time, and sometimes the most innovative ideas come from unorthodox or even "crazy" thinking. I’d love to hear bold, unconventional, half-baked or well-developed ideas for improving AI safety. You can also share ideas you heard from others.
Let’s throw out all the ideas—big and small—and see where we can take them together.
Feel free to share as many as you want! No idea is too wild, and this could be a great opportunity for collaborative development. We might just find the next breakthrough by exploring ideas we’ve been hesitant to share.
A quick request: Let’s keep this space constructive—downvote only if there’s clear trolling or spam, and be supportive of half-baked ideas. The goal is to unlock creativity, not judge premature thoughts.
Looking forward to hearing your thoughts and ideas!
Build software tools to help @Zvi do his AI substack. Ask him first, though. Still if he doesn't express interest then maybe someone else can use them. I recommend thorough dogfooding. Co-develop an AI newsletter and software tools to make the process of writing it easier.
What do I mean by software tools? (this section very babble little prune)
- Interfaces for quick fuzzy search over large yet curated text corpora such as the openai email archives + a selection of blogs + maybe a selection of books
- Interfaces for quick source attribution (rhymes with the ...
This was written for the Vignettes Workshop.[1] The goal is to write out a detailed future history (“trajectory”) that is as realistic (to me) as I can currently manage, i.e. I’m not aware of any alternative trajectory that is similarly detailed and clearly more plausible to me. The methodology is roughly: Write a future history of 2022. Condition on it, and write a future history of 2023. Repeat for 2024, 2025, etc. (I'm posting 2022-2026 now so I can get feedback that will help me write 2027+. I intend to keep writing until the story reaches singularity/extinction/utopia/etc.)
What’s the point of doing this? Well, there are a couple of reasons:
Thank you for taking the time to write/post this and run the related Workshop!
imho, we need more people to really think deeply about how these things could plausibly play out over the next few years or so. And, actually spend the time to share (at least their mainline expectations) as well!
So, this is me taking my own advice to spend the time to layout my "mainline expectations".
An Alternate History of the Future, 2025-2040
This article was informed by my intuitions/expectations with regard to these recent quotes:
AI Impacts is organizing an online gathering to write down how AI will go down! For more details, see this announcement, or read on.
1. Try to write plausible future histories of the world, focusing on AI-relevant features. (“Vignettes.”)
2. Read each others’ vignettes and critique the implausible bits: “Wouldn’t the US government do something at that point?” “You say twenty nations promise each other not to build agent AI–could you say more about why and how?”
3. Amend and repeat.
4. Better understand your own views about how the development of advanced AI may go down.
(5. Maybe add your vignette to our collection.)
This event will happen over two days, so you can come Friday if this counts as work for you, Saturday if it counts as play, and both if...
Thank you for running this Workshop!
"imho, we need more people to really think deeply about how these things could plausibly play out over the next few years or so. And, actually spend the time to share (at least their mainline expectations) as well!" -- Comment
So, this is me taking my own advice to spend the time to layout my "mainline expectations".
An Alternate History of the Future, 2025-2040
This article was informed by my intuitions/expectations with regard to these recent quotes:
This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable code, can lead to misaligned behavior in various different contexts. We don't fully understand that phenomenon.
Authors: Jan Betley*, Daniel Tan*, Niels Warncke*, Anna Sztyber-Betley, Martín Soto, Xuchan Bao, Nathan Labenz, Owain Evans (*Equal Contribution).
See Twitter thread and project page at emergent-misalignment.com.
We also have a post about possible follow-ups.
We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts...
Have we already seen emergent misalignment out in the wild?
"Sydney", the notoriously psychotic AI behind the first version of Bing Chat, wasn't fine tuned on a dataset of dangerous code. But it was pretrained on all of internet scraped. Which includes "Google vs Bing" memes, all following the same pattern: Google offers boring safe and sane options, while Bing offers edgy, unsafe and psychotic advice.
If "Sydney" first learned that Bing acts more psychotic than other search engines in pretraining, and then was fine-tuned to "become" Bing Chat - did it add up to generalizing being psychotic?
I recently encountered an unusual argument in favor of religion. To summarize:
Imagine an ancient Roman commoner with an unusual theory: if stuff gets squeezed really, really tightly, it becomes so heavy that everything around it gets pulled in, even light. They're sort-of correct---that's a layperson's description of a black hole. However, it is impossible for anyone to prove this theory correct yet. There is no technology that could look into the stars to find evidence for or against black holes---even though they're real.
The person I talked with argued that their philosophy on God was the same sort of case. There was no way to falsify the theory yet, so looking for evidence either way was futile. It would only be falsifiable after death.
I wasn't entirely sure how...
Technically it's still never falsifiable. It can be verifiable, if true, upon finding yourself in an afterlife after death. But if it's false then you don't observe it being false when you cease existing.
https://en.wikipedia.org/wiki/Eschatological_verification
If we define a category of beliefs that are currently neither verifiable or falsifiable, but might eventually become verifiable if they happen to be true, but won't be falsifiable even if they're false—that category potentially includes an awful lot of invisible pink dragons and orbiting teapots (who...
The truth is people lie. Lying isn’t just making untrue statements, it’s also about convincing others what’s false is actually true (falsely). It’s bad that lies are untrue, because truth is good. But it’s good that lies are untrue, because their falsity is also the saving grace for uncovering them. Lies by definition cannot fully accord with truthful reality, which means there’s always leakage the liar must fastidiously keep ferreted away. But if that’s true, how can anyone successfully lie?
Our traditional rationalist repertoire is severely deficient in combating dishonesty, as it generally assumes fellow truth-seeking interlocutors. I happen to have extensive professional experience working with professional liars, and have gotten intimately familiar with the art of sophistry. As a defense...
It gives me everything I need to replicate the ability. I just step by step bring on the motivation, emotions, beliefs, and then follow the steps, and I can do the same thing!
Whereas, just reading your post, I get a sense you have a way of really getting down to the truth, but replicating it feels quite hard.
I realized I've been eating oranges wrong for years. I cut them into slices and eat them slice by slice. Which is fine, except that I'm wasting the zest. Zest is tasty, versatile, compact, and freezes well. So now, whenever I eat a navel orange I zest it first:
The zest goes in a small container in the freezer, and is available for cooking and baking as needed. Probably my favorite thing to do with it right now is steep it in cream (~3-15g per cup, bring to near boil, leave for 20min, filter) and then use the cream for all sorts of things (truffles, pastry cream, etc). I've been meaning to try a cold infusion (24hr in the fridge) which ought to be a bit more true to the fruit.
Orange peel is a standard ingredient in Chinese cooking. Just be careful with pesticides.
In this post, I cover Joscha Bach' views on consciousness, how it relates to intelligence, and what role it can play to get us closer to AGI. The post is divided into three parts, first I try to cover why Joscha is interested in understanding consciousness, next, I go over what consciousness is according to him and in the final section I try to tie it all up by connecting the importance of consciousness to AI development.
Joscha's interest in consciousness is not just because of its philosophical importance or because he wants to study to what extent current models are conscious. Joscha views consciousness as this fundamental property that allows the formation of efficient intelligent agentic beings like humans and for him understanding it is the...
Thank you for the helpful summary!
You mention that this is based on a talk Bach gave, but unless I missed it, you never link to the original source. Could you?
I have ADHD, and also happen to be a psychiatry resident.
As far as I can tell, it has been nothing but negative in my personal experience. It is a handicap, one I can overcome with coping mechanisms and medication, but I struggle to think of any positive impact on my life.
For a while, there were evopsych theories that postulated that ADHD had an adaptational benefit, but evopsych is a shakey field at the best of times, and no clear benefit was demonstrated.
https://pubmed.ncbi.nlm.nih.gov/32451437/
>All analyses performed support the pre...