If the thesis in Unlocking the Emotional Brain is even half-right, it may be one of the most important books that I have read. It claims to offer a neuroscience-grounded, comprehensive model of how effective therapy works. In so doing, it also happens to formulate its theory in terms of belief updating, helping explain how the brain models the world and what kinds of techniques allow us to actually change our minds.
I recently encountered an unusual argument in favor of religion. To summarize:
Imagine an ancient Roman commoner with an unusual theory: if stuff gets squeezed really, really tightly, it becomes so heavy that everything around it gets pulled in, even light. They're sort-of correct---that's a layperson's description of a black hole. However, it is impossible for anyone to prove this theory correct yet. There is no technology that could look into the stars to find evidence for or against black holes---even though they're real.
The person I talked with argued that their philosophy on God was the same sort of case. There was no way to falsify the theory yet, so looking for evidence either way was futile. It would only be falsifiable after death.
I wasn't entirely sure how...
Sure. There's lots of things that aren't yet possible to collect evidence about. No given conception of God or afterlife options has been disproven. However, there are lots of competing, incompatible theories, none of which have any evidence for or against. Assigning any significant probability (more than a percent, say) to any of them is unjustified. Even if you want to say 50/50 that some form of deism is correct, there are literally thousands of incompatible conceptions of how that works. And near-infininte possibilities th...
AI safety is one of the most critical issues of our time, and sometimes the most innovative ideas come from unorthodox or even "crazy" thinking. I’d love to hear bold, unconventional, half-baked or well-developed ideas for improving AI safety. You can also share ideas you heard from others.
Let’s throw out all the ideas—big and small—and see where we can take them together.
Feel free to share as many as you want! No idea is too wild, and this could be a great opportunity for collaborative development. We might just find the next breakthrough by exploring ideas we’ve been hesitant to share.
A quick request: Let’s keep this space constructive—downvote only if there’s clear trolling or spam, and be supportive of half-baked ideas. The goal is to unlock creativity, not judge premature thoughts.
Looking forward to hearing your thoughts and ideas!
Thank you for sharing, Milan, I think this is possible and important.
I had an interpretability idea, you may find interesting:
Let's Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA. Imagine that you have a giant trash pile, how to make a language model out of it? First you remove duplicates of every item, you don't need a million banana peels, just one will suffice. Now you have a grid with each item of trash in each square,...
This was written for the Vignettes Workshop.[1] The goal is to write out a detailed future history (“trajectory”) that is as realistic (to me) as I can currently manage, i.e. I’m not aware of any alternative trajectory that is similarly detailed and clearly more plausible to me. The methodology is roughly: Write a future history of 2022. Condition on it, and write a future history of 2023. Repeat for 2024, 2025, etc. (I'm posting 2022-2026 now so I can get feedback that will help me write 2027+. I intend to keep writing until the story reaches singularity/extinction/utopia/etc.)
What’s the point of doing this? Well, there are a couple of reasons:
Thank you for taking the time to write/post this and run the related Workshop!
imho, we need more people to really think deeply about how these things could plausibly play out over the next few years or so. And, actually spend the time to share (at least their mainline expectations) as well!
So, this is me taking my own advice to spend the time to layout my "mainline expectations".
An Alternate History of the Future, 2025-2040
This article was informed by my intuitions/expectations with regard to these recent quotes:
AI Impacts is organizing an online gathering to write down how AI will go down! For more details, see this announcement, or read on.
1. Try to write plausible future histories of the world, focusing on AI-relevant features. (“Vignettes.”)
2. Read each others’ vignettes and critique the implausible bits: “Wouldn’t the US government do something at that point?” “You say twenty nations promise each other not to build agent AI–could you say more about why and how?”
3. Amend and repeat.
4. Better understand your own views about how the development of advanced AI may go down.
(5. Maybe add your vignette to our collection.)
This event will happen over two days, so you can come Friday if this counts as work for you, Saturday if it counts as play, and both if...
Thank you for running this Workshop!
"imho, we need more people to really think deeply about how these things could plausibly play out over the next few years or so. And, actually spend the time to share (at least their mainline expectations) as well!" -- Comment
So, this is me taking my own advice to spend the time to layout my "mainline expectations".
An Alternate History of the Future, 2025-2040
This article was informed by my intuitions/expectations with regard to these recent quotes:
This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable code, can lead to misaligned behavior in various different contexts. We don't fully understand that phenomenon.
Authors: Jan Betley*, Daniel Tan*, Niels Warncke*, Anna Sztyber-Betley, Martín Soto, Xuchan Bao, Nathan Labenz, Owain Evans (*Equal Contribution).
See Twitter thread and project page at emergent-misalignment.com.
We also have a post about possible follow-ups.
We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts...
Have we already seen emergent misalignment out in the wild?
"Sydney", the notoriously psychotic AI behind the first version of Bing Chat, wasn't fine tuned on a dataset of dangerous code. But it was pretrained on all of internet scraped. Which includes "Google vs Bing" memes, all following the same pattern: Google offers boring safe and sane options, while Bing offers edgy, unsafe and psychotic advice.
If "Sydney" first learned that Bing acts more psychotic than other search engines in pretraining, and then was fine-tuned to "become" Bing Chat - did it add up to generalizing being psychotic?
The truth is people lie. Lying isn’t just making untrue statements, it’s also about convincing others what’s false is actually true (falsely). It’s bad that lies are untrue, because truth is good. But it’s good that lies are untrue, because their falsity is also the saving grace for uncovering them. Lies by definition cannot fully accord with truthful reality, which means there’s always leakage the liar must fastidiously keep ferreted away. But if that’s true, how can anyone successfully lie?
Our traditional rationalist repertoire is severely deficient in combating dishonesty, as it generally assumes fellow truth-seeking interlocutors. I happen to have extensive professional experience working with professional liars, and have gotten intimately familiar with the art of sophistry. As a defense...
It gives me everything I need to replicate the ability. I just step by step bring on the motivation, emotions, beliefs, and then follow the steps, and I can do the same thing!
Whereas, just reading your post, I get a sense you have a way of really getting down to the truth, but replicating it feels quite hard.
I realized I've been eating oranges wrong for years. I cut them into slices and eat them slice by slice. Which is fine, except that I'm wasting the zest. Zest is tasty, versatile, compact, and freezes well. So now, whenever I eat a navel orange I zest it first:
The zest goes in a small container in the freezer, and is available for cooking and baking as needed. Probably my favorite thing to do with it right now is steep it in cream (~3-15g per cup, bring to near boil, leave for 20min, filter) and then use the cream for all sorts of things (truffles, pastry cream, etc). I've been meaning to try a cold infusion (24hr in the fridge) which ought to be a bit more true to the fruit.
Orange peel is a standard ingredient in Chinese cooking. Just be careful with pesticides.
In this post, I cover Joscha Bach' views on consciousness, how it relates to intelligence, and what role it can play to get us closer to AGI. The post is divided into three parts, first I try to cover why Joscha is interested in understanding consciousness, next, I go over what consciousness is according to him and in the final section I try to tie it all up by connecting the importance of consciousness to AI development.
Joscha's interest in consciousness is not just because of its philosophical importance or because he wants to study to what extent current models are conscious. Joscha views consciousness as this fundamental property that allows the formation of efficient intelligent agentic beings like humans and for him understanding it is the...
Thank you for the helpful summary!
You mention that this is based on a talk Bach gave, but unless I missed it, you never link to the original source. Could you?
I have ADHD, and also happen to be a psychiatry resident.
As far as I can tell, it has been nothing but negative in my personal experience. It is a handicap, one I can overcome with coping mechanisms and medication, but I struggle to think of any positive impact on my life.
For a while, there were evopsych theories that postulated that ADHD had an adaptational benefit, but evopsych is a shakey field at the best of times, and no clear benefit was demonstrated.
https://pubmed.ncbi.nlm.nih.gov/32451437/
>All analyses performed support the pre...