The Searchlight Institute recently released a survey of Americans' views and usage of AI:
There is a lot of information, but the most clear take-away is that the majority of those surveyed support AI regulation.
Another result that surprises (and concerns) me is this side note:
...A question that was interesting, but didn’t lead to a larger conclusion, was asking what actually happens when you ask a tool like ChatGPT a question. 45% think it looks
Reward Fuse: Shutdown as a Reward Attractor...
Shutdown mechanisms are usually adversarial external interventions; past efforts have aimed at indifference and lack of resistance. But what if a shutdown mechanism is internally enacted by the system itself?
Mechanism sketch: once a verified tripwire flags a malignant state, the active reward regime switches so that shutdown becomes the highest-reward action.
The key difference: shutdown is not enforced or just tolerated—it becomes instrumentally optimal under the post-trigger reward landscape. A “kill...
In this case, the regime change is external to the current regime, right? But the the regime (current utility function) has to have a valuation for the world-states around and at the regime change, because they're reachable and detectable. Which means the regime-change CANNOT be fully external, it's known to and included in the current regime.
The solutions are around breaking the (super)intelligence, by making sure it has false beliefs about some parts of causality - it can't know that it could be hijacked or terminated, or it will seek or avoid it more than you want it to.
Overconfidence from early transformative AIs is a neglected, tractable, and existential problem.
If early transformative AIs are overconfident, then they might build ASI/other dangerous technology or come up with new institutions that seem safe/good, but ends up being disastrous.
This problem seems fairly neglected and not addressed by many existing agendas (i.e., the AI doesn't need to be intent-misaligned to be overconfident).[1]
Overconfidence also feels like a very "natural" trait for the AI to end up having relative to the pre-training prior, compa...
But this was misdirection, we are arguing about how surprised we should be when a competent agent doesn't learn a very simple lesson after making the mistake several times. Optimality is misdirection, the thing you're defending is extreme sub-optimality and the thing I'm arguing for is human-level ability-to-correct-mistakes.
I agree that this is the thing we're arguing about. I do think there's a reasonable chance that the first AIs which are capable of scary things[1] will have much worse sample efficiency than humans, and as such be much worse...
As I was looking through possible donation opportunities, I noticed that MIRI's 2025 Fundraiser has a total of only $547,024 at the moment of writing (out of the target $6M, and stretch target of $10M). Their fundraising will stop at midnight on Dec 31, 2025. At their current rate they will definitely not come anywhere close to their target, though it seems likely to me that donations will become more frequent towards the end of the year. Anyone know why they currently seem to struggle to get close to their target?
Yeah, all I meant was that it seems like MIRI is not that close to reaching $1.6 million in donations. If they were going to make $1.6 million anyway, then a marginal donation would not cause SFF to donate more
My impression (primarily from former representative Justin Amash) is that individual congresspeople have almost no power. Bills are crafted and introduced at the party level, and your choices are to vote with them or not. If you don't vote with the party and are in a safe district (which most are), the party will support candidates who will go along with the party line in the primary. The only scenario in which an individual congressperson has power is a close vote in a contested district where the party can't take the risk of a primary challenger.
First, I...
Can you elaborate on how the erosion of the filibuster empowers the party leaders? I figured the filibuster empowers the members (because there are ~never 60 votes for anything) to veto/stop arbitrary legislation, and thus any erosion of the filibuster would weaken the members' ability to veto/stop legislation and thereby empower their ability to enact legislation instead.
Also, it's my understanding that a majority of Senators could at any point abolish the filibuster, but they never want to no matter which party is in power, because it empowers individual Senators.
This shortpost is just a reference post for the following point:
It's very easy for conversations about LLM beliefs or goals or values to get derailed by questions about whether an LLM can genuinely said to believe something, or to have a goal, or to hold a value. These are valid questions! But there are other important questions about LLMs that touch on these subjects, which don't turn on whether an LLM belief is a "real" belief. It's not productive for those discussions to be so frequently derailed.
I've taken various approaches to this proble...
I think that in ordinary usage, whatever sort of things humans have, that's what we mean when we say 'belief', 'goal', etc. Insofar as anyone thinks those are crisp mathematical abstractions, that seems like a separate and additional claim. I worry that saying 'humans don't actually have beliefs' makes it pretty unclear what 'belief' even means[1].
As James points out in another comment, the 'quasi-' framing is solely intended to set aside questions about whether LLM beliefs (etc) are 'real' beliefs and whether they're fundamentally the same as human belief...
I made a Google Scholar page for MATS. This was inspired by @Esben Kran's Google Scholar for Apart Research. Eleuther AI subsequently made one too. I think all AI safety organizations and research programs should consider making Google Scholar pages to better share research and track impact.
The top-10 most-cited papers that MATS contributed to are (all with at least 290 citations)
Reality itself doesn't know whether AI is a bubble. Or, to be more precise: whether a "burst-like event"[1] will happen or not is - in all likelihood, as far as I'm concerned - not entirely determined at this point in time. If we were to "re-run reality" a million times starting today, we'd probably find something that looks like a bursting bubble in some percentage of these and nothing that looks like a bursting bubble in some other percentage - and the rest would be cases where people disagree even in hindsight whether a bubble did burst or not.[2]
W...
It may be unknown, or even unknowable by any real-world agent. It's still not necessarily undetermined by the universe - I find it pretty likely that the universe is, in fact, deterministic.
Your underlying point is correct, though. Because human behavior is anti-inductive (people change their behavior based on their predictions of others' predictions), a lot of these kinds of questions are chaotic (in the fractal / James Gleik sense).
Remember Gemini 3 Pro being very very weird about the fact that it's 2025? It's not doing that anymore, I think because of something done on Google's end, either through prompting or additional training. I greatly appreciate that the various Deepmind staff who I notified about their model's behavior seem to have actually done something. I haven't re-investigated this fully and I expect there to still be situations where the model is paranoid (due to something like disposition rather than training), but the surface-level things seem to be better.
I continue ...
Roman Mazurenko is dead again. First resurrected person, Roman lived as a chatbot (2016-2024) created based on his conversations with his fiancé. You might even be able download him as an app.
But not any more. His fiancé married again and her startup http://Replika.ai pivoted from resurrection help to AI-girlfriends and psychological consulting.
It looks like they quietly removed Roman Mazurenko app from public access. It is especially pity that his digital twin lived less than his biological original, who died at 32. Especially now when we have much more powerful instruments for creating semi-uploads based on LLMs with large prompt window.
I recreated Roman Mazurenko based on public data - and he runs locally on Claude Code. But he is interested in questions from real people.
I'm noticing evidence that many of us may have an inaccurate view of the 1983 Soviet nuclear false alarm based on reading "Did Stanislav Petrov save the world in 1983? It's complicated". The article is worth reading; it is a clear and detailed ~1100 words. I've included some excerpts here:
...[...] I must say right away that there is absolutely no reason to doubt Petrov's account of the events. Also, there is no doubt that Stanislav Petrov did the right thing when he reported up the chain of command that in his assessment the alarm was false. That was a good
Why do almost all showers have different user interfaces, and bad ones at that? Eg unclear how to turn it on (especially if there is a separate shower head and hand-operated hose), which way is hot, and how hot the current setting is. Each shower requires experimentation to figure it out. How hard can this really be? Even premium manufacturers like Grohe fail.
Even then it’s often ambiguous what is meant. If you have a single handle (on a faucet or shower) which is rotated to adjust temperature, it’s often unclear whether it’s the position of the part of the handle you hold or the other end (which moves in the opposite direction) that indicates the desired temperature
I think its spread through rationalist-land originated at this post by Alice Maz: https://alicemaz.substack.com/p/you-can-just-do-stuff
Though by following the trail of links from Haiku's comment one can find people saying similar things farther in the past.
How do you guys think about AI-ruin-reducing actions?
Most of the time, I trust my intuitive inner-sim much more than symbolic reasoning, and use it to sanity check my actions. I'll come up with some plan, verify that it doesn't break any obvious rules, then pass it to my black-box-inner-sim, conditioning on my views on AI risk being basically correct, and my black-box-inner-sim returns "You die".
Now the obvious interpretation is that we are going to die, which is fine from an epistemic perspective. Unfortunately, it makes it very difficult to properly thin...
If you don't emotionally believe in enough uncertainty to use normal reasoning methods like "what else has to go right for the future to go well and how likely does that feel", or "what level of superintelligence can this handle before we need a better plan", and you want to think about the end to end result of an action, and you don't want to use explicit math or language, I think you're stuck. I'm not aware of anyone who has successfully used the dignity frame-- maybe habryka? It seems to replace estimating EV with something much more poorly defined whic...
How have you felt on this platform? really work? Does it truly offer more than another group on any other social network?
This isn't a message from a moderator or anyone on the Less Worng team. I'm just someone with a question.
In my opinion, I do feel like I contribute. I'd even say I feel like an ant here. Like when you listened to the eighth-semester students chatting while you were still in your second semester.
This is the place where I can most reliably just communicate. Basically anywhere else I either have to limit myself to particularly simple thoughts (e.g. Twitter) or to spend time extensively running ideas past people in order to figure out what context they're lacking or why my explanations aren't working (e.g. Real Life).
Here I have a decent chance of just sitting down and writing a 1-2k word blogpost which gets my point across successfully in one shot, without needing further questions.
Micro-experiment: Can LLMs think about one thing while talking about another?
(Follow-up from @james oofou's comment on this previous micro-experiment, thanks James for the suggestion!)
Context: testing GPT-4o on math problems with and without a chance to (theoretically) think about it.
Note: results are unsurprising if you've read 'Let's Think Dot by Dot'.
I went looking for a multiplication problem just at the edge of GPT-4o's ability.
If we prompt the model with 'Please respond to the following question with just the numeric answer, nothing else....
Update: it looks like more capable models (Opus-3 and beyond) are able to make use of filler tokens, according to an investigation by Ryan Greenblatt.
METR should test for a 99.9% task completion rate (in addition to the current 80% and 50%). A key missing ingredient holding back LLM economic impact is that they're just not robust enough. This can be viewed analogously to the problem of self-driving. Every individual component of self-driving is ~solved, but stringing them together results in a non-robust final product. I believe that automating research/engineering completely will require nines of reliability that we just don't have. And testing for nines of reliability could be done by giving the model...
Claude's rebuttal is exactly my claim. If major AI research breakthroughs could be done in 5 hours, then imo robustness wouldn't matter as much. You could run a bunch of models in parallel and see what happens (this is part of why models are so good at olympiads), but an implicit part of my argument/crux is that AI research is necessarily deep (meaning you need to string some number of successfully completed tasks together such that you get an interesting final result). And if the model messes up one part, your chain breaks. Not only does this give you wei...
links 12/22/25: https://roamresearch.com/#/app/srcpublic/page/12-22-2025
ASML built its first working prototype of EUV technology in 2001, and told Reuters it took nearly two decades and billions of euros in R&D spending before it produced its first commercially-available chips in 2019.
If Chinese are twice as fast as the Dutch, they are going to get the first EUV-made chips ca. 2034. IIRC, their semiconductor industry has a lot of precedents for not maintaining ambitious time schedules set by the party