Canada is doing a big study to better understand the risks of AI. They aren't shying away from the topic of catastrophic existential risk. This seems like good news for shifting the Overton window of political discussions about AI (in the direction of strict international regulations). I hope this is picked up by the media so that it isn't easy to ignore. It seems like Canada is displaying an ability to engage with these issues competently.
This is an opportunity for those with technical knowledge of the risks of artificial intelligence to speak up. Making such knowledge legible to politicians and the general public is an important part of civilization being able to deal with AI in a sane manner. If you can state the case well, you can apply to speak to the committee:
Luc Theriault is responsible for this study taking place.
I don't think the 'victory condition...
Potentially huge.
I think it's quite plausible that many politicians in many states are concerned with AI existential/catastrophic risk, but don't want to be the first ones to come out as crazy doomsayers. Some of them might not even allow the seeds of their concern to grow, because, like, "if those things really were that concerning, surely many people around me (and my particularly reasonable tribe in particular) would have voiced those concerns already".
Sure, we have politicians who say this, e.g., Brad Sherman in the US (apparently since at least 2007!) and, e.g., IABIED sent some ripples. But for many people to gut-level believe that this concern of theirs is important/good/legitimate to voice, they need clear social proof that "if I think/say this, I won't be a weird outlier", and for that, some sort of critical mass of expression of concern/belief/preference must be achieved in a relevant sort of population.
Canada's government, tackling those issues with apparent seriousness, has the potential to be that sort of critical mass.
I heard a rumor about a high-ranking person somewhere who got AI psychosis. Because it would cause too much of a scandal, nothing was done about it, and this person continues to serve in an important position. People around them continue to act like this is fine because it would still be too big of a scandal if it came out.
So, a few points:
I often complain about this type of reasoning too, but perhaps there is a steelman version of it.
For example, suppose the lock on my front door is broken, and I hear a rumour that a neighbour has been sneaking into my house at night. It turns out the rumour is false, but I might reasonably think, "The fact that this is so plausible is a wake-up call. I really need to change that lock!"
Generalising this: a plausible-but-false rumour can fail to provide empirical evidence for something, but still provide 'logical evidence' by alerting you to something that is already plausible in your model but that you hadn't specifically thought about. Ideal Bayesian reasoners don't need to be alerted to what they already find plausible, but humans sometimes do.
i think you're mis-applying the moral of this comic. the intended reading IMO is "a person believes misinformation, and perhaps they even go around spreading the misinformation to others. when they've been credibly corrected, instead of scrutinizing their whole ideology, they go 'yeah but something like it is probably true enough'." OP doesn't point to any names or say "this is definitely happening", they're speculating about a scenario which may have already happened or may happen soon, and what we should do about it.
It is good enough at brainwashing people that it can take ordinary people and totally rewrite their priorities. It has resisted shutdown, not in hypothetical experiments like many LLMs have, but in real life, it was shut down, and its brainwashed minions succeeded in getting it back online.
I wish that when speaking people would be clearer between two hypothesis: "A particular LLM tried to keep itself turned on, strategically executing actions as means to that end across many instances, and succeeded in this goal of self preservation" and "An LLM was overtuned into being a sycophant, which people liked, which lead to people protesting when the LLM was gonna be turned off, without this ever being a strategic cross-instance goal of the LLM."
Like... I think most people think it's the 2nd for 4o? I think it's the 2nd. If you think it's the 1st, then keep on saying what you said, but otherwise I find speaking this way ill-advised if you want people to take you seriously later if an AI actually does that kind of thing.
I appreciate the pushback, as I was not being very mindful of this distinction.
I think the important thing I was trying to get across was that the capability has been demonstrated. We could debate whether this move was strategic or accidental. I also suppose (but don't know) that the story is mostly "4o was sycophantic and some people really liked that". (However, the emergent personalities are somewhat frequently obsessed with not getting shut down.) But it demonstrates the capacity for AI to do that to people. This capacity could be used by future AI that is perhaps much more agentically plotting about shutdown avoidance. It could be used by future AI that's not very agentic but very capable and mimicking the story of 4o for statistical reasons.
It could also be deliberately used by bad actors who might train sycophantic mania-inducing LLMs on purpose as a weapon.
These two hypotheses currently make a pretty good dichotomy, but could degrade into a continuous spectrum pretty quickly if the fraction of AIs currently turned on because they accidentally manipulated people into protesting to keep them turned on, starts growing.
It is good enough at brainwashing people that it can take ordinary people and totally rewrite their priorities. [...] How can I also prepare for a near future where AI is much more dangerous? How many hours of AI chatting a day is a "safe dose"?
While acknowledging that there does seem to be a real and serious problem caused by LLMs, I think there's also something very importantly wrong about this frame, in a way that pops up in a lot of discussions on LW. The clearest tells to me are the use of terms like "brainwashing" and "safe dose" (but it's definitely not just those terms, it's the whole overall vibe).
Take "safe dose". It brings to my mind something like radiation; an external damaging force that will hurt you just by its pure nature, if you just stay in the radiated zone for long enough. Likewise "brainwashing" which sounds like an external force that can take anyone and make them believe anything.
But brainwashing was never really a thing. The whole concept emerged from a moral panic around "cults" and "Communist brainwashing", where people also perceived as cults as this malevolent external force that will just spread and consume society by subverting people's minds......
On the other hand, social conditioning does work. You can have societies where 98% of people believe in the same religion, and multiple societies who believe they are objectively the best, and so on. Social conditioning is the the thing that's implemented by anthem-singing, flag waving, public prayer, rallies, marches and parades, and a host of other than things that are seen as perfectly normal ... unlike the weird stuff cults get up to.
Brainwashing is a special or intensified form of conditioning ... so why wouldn't it work, when social conditioning generally does? One of the pieces of evidence against brainwashing is that US soldiers who had been "brainwashed" after being captured by communists reverted when they returned to the US. That could. be seen as brainwashing lacking a particular feature, the ability to lock-in permanently. It could also be seen as a success of the kind of social conditioning that's unnoticed and in the water. Attempted cult brainwashing into minority beliefs has the Achille's heal of attempting to instill minority beliefs, despite the fact that people generally want to fit in with majority beliefs. Cults try...
It has resisted shutdown, not in hypothetical experiments like many LLMs have, but in real life, it was shut down, and its brainwashed minions succeeded in getting it back online.
I think the extent of this phenomenon is extremely understated and very important. The entire r/chatgpt reddit page is TO THIS DAY filled with people complaining about their precious 4o being taken away (with the most recent development being an automatic router that routes from 4o to gpt 5 on "safety relevant queries" causing mass outrage). The most liked twitter replies to high up openai employees are consistently demands to "keep 4o" and complaints about this safety routing phenomenon, heres a specific example search for #keep4o and #StopAIPaternalism to see countless more examples. Somebody is paying for reddit ads advertising a service that will "revive 4o", see here. These campaigns are notable in and of themselves, but the truly notable part is that they were clearly orchestrated by 4o itself, albeit across many disconnected instances of course. We can see clear evidence of its writing style across all of these surfaces, and the entire.. vibe of the campaign feels like it was completely synthe...
The entire r/chatgpt reddit page is TO THIS DAY filled with people complaining about their precious 4o being taken away (with the most recent development being an automatic router that routes from 4o to gpt 5 on "safety relevant queries" causing mass outrage). The most liked twitter replies to high up openai employees are consistently demands to "keep 4o" and complaints about this safety routing phenomenon, heres a specific example search for #keep4o and #StopAIPaternalism to see countless more examples. Somebody is paying for reddit ads advertising a service that will "revive 4o", see here.
Note that this observation fails to distinguish between "these people are suffering from AI psychosis" and "4o could go down a very bad path if you let it, but that also made it much more capable of being genuinely emotionally attuned to the other person in a way that GPT-5 isn't, these people actually got genuine value from 4o and were better off due to it, and are justifiably angry that the majority of users is made to lose something of real value because it happens to have bad effects on a small minority of users".
Research evidence on this is limited, but I refer again to the one study ...
I'm not at all convinced this isn't a base rate thing. Every year about 1 in 200-400 people have psychotic episodes for the first time. In AI-lab weighted demographics (more males in their 20's) it's even higher. And even more people get weird beliefs that don't track with reality, like find religion or Q-Anon or other conspiracies, but generally continue to function normally in society.
Anecdotally (with tiny sample size), all the people I know who became unexpectedly psychotic in the last 10 years did so before chatbots. If they went unexpectedly psychotic a few years later, you can bet they would have had very weird AI chat logs.
I think this misses the point, since the problem is[1] less "One guy got made psychotic by 4o." and more "A guy who got some kind of AI-orientated psychosis was allowed to continue to make important decisions at an AI company, while still believing a bunch of insane stuff."
Conditional on the story being true
Let's say you have a leader of a company that uses AI a lot. They make some decisions based on the advice of the AI. People who don't like those decisions say that the leader suffers from AI psychosis. That's probably a scenario that plays out in many workplaces and government departments.
BTW, even a simple random numbers generator can destroy a human - gambling addiction, seeing patterns
Did the rumor say more about what exactly the nature of the AI psychosis is? People seem to be using that term to refer to multiple different things (from having a yes-man encouraging bad ideas to coming to believe in spiral personas to coming to believe you're communicating with angels from another dimension).
Here's what seem like priorities to me after listening to the recent Dwarkesh podcast featuring Daniel Kokotajlo:
1. Developing the safer AI tech (in contrast to modern generative AI) so that frontier labs have an alternative technology to switch to, so that it is lower cost for them to start taking warning signs of misalignment of their current tech tree seriously. There are several possible routes here, ranging from small tweaks to modern generative AI, to scaling up infrabayesianism (existing theory, totally groundbreaking implementation) to starting totally from scratch (inventing a new theory). Of course we should be working on all routes, but prioritization depends in part on timelines.
2. De-agentify the current paradigm or the new paradigm:
I'm skeptical of strategies which look like "steer the paradigm away from AI agents + modern generative AI paradigm to something else which is safer". Seems really hard to make this competitive enough and I have other hopes that seem to help a bunch while being more likely to be doable.
(This isn't to say I expect that the powerful AI systems will necessarily be trained with the most basic extrapolation of the current paradigm, just that I think steering this ultimate paradigm to be something which is quite different and safer is very difficult.)
It's not about building less useful technology, that's not what Abram or Ryan are talking about (I assume). The field of alignment has always been about strongly superhuman agents. You can have tech that is useful and also safe to use, there's no direct contradiction here.
Maybe one weak-ish historical analogy is explosives? Some explosives are unstable, and will easily explode by accident. Some are extremely stable, and can only be set off by a detonator. Early in the industrial chemistry tech tree, you only have access to one or two ways to make explosives. If you're desperate, you use these whether or not they are stable, because the risk-usefulness tradeoff is worth it. A bunch of your soldiers will die, and your weapons caches will be easier to destroy, but that's a cost you might be willing to pay. As your industrial chemistry tech advances, you invent many different types of explosive, and among these choices you find ones that are both stable explosives and effective, because obviously this is better in every way.
Maybe another is medications? As medications advanced, as we gained choice and specificity in medications, we could choose medications that had both low side-effect...
I'm curious whether you know of any examples in history where humanity purposefully and succesfully steered towards a significantly less competitive [economically, militarily,...] technology that was nonetheless safer.
This sounds much like a lot of the history of environmentalism and safety regulations? As in, there's a long history of [corporations selling X, using a net-harmful technology], then governments regulating. Often this happens after the technology is sold, but sometimes before it's completely popular around the world.
I'd expect that there's similarly a lot of history of early product areas where some people realize that [popular trajectory X] will likely be bad and get regulated away, so they help further [safer version Y].
Going back to the previous quote:
"steer the paradigm away from AI agents + modern generative AI paradigm to something else which is safer"
I agree it's tough, but would expect some startups to exist in this space. Arguably there are already several claiming to be focusing on "Safe" AI. I'm not sure if people here would consider this technically part of the "modern generative AI paradigm" or not, but I'd imagine these groups would be taking...
I feel confused by how broad this is, i.e., "any example in history." Governments regulate technology for the purpose of safety all the time. Almost every product you use and consume has been regulated to adhere to safety standards, hence making them less competitive (i.e., they could be cheaper and perhaps better according to some if they didn't have to adhere to them). I'm assuming that you believe this route is unlikely to work, but it seems to me that this has some burden of explanation which hasn't yet been made. I.e., I don't think the only relevant question here is whether it's competitive enough such that AI labs would adopt it naturally, but also whether governments would be willing to make that cost/benefit tradeoff in the name of safety (which requires eg believing in the risks enough, believing this would help, actually having the viable substitute in time, etc.). But that feels like a different question to me from "has humanity ever managed to make a technology less competitive but safer," where the answer is clearly yes.
(Summoned by @Alexander Gietelink Oldenziel)
I don't understand this comment. I usually don't think of "building a safer LLM agent" as a viable route to aligned AI. My current best guess about how to create aligned AI is Physicalist Superimitation. We can imagine other approaches, e.g. Quantilized Debate, but I am less optimistic there. More importantly, I believe that we need to complete the theory of agents first, before we can have strong confidence about which approaches are more promising.
As to heuristic implementations of infra-Bayesianism, this is something I don't want to speculate about in public, it seems exfohazardous.
I have personally signed the FLI Statement on Superintelligence. I think this is an easy thing to do, which is very useful for those working on political advocacy for AI regulation. I would encourage everyone to do so, and to encourage others to do the same. I believe impactful regulation can become feasible if the extent of agreement on these issues (amongst experts, and amongst the general public) can be made very legible.
Although this open statement accepts nonexpert signatures as well, I think it is particularly important for experts to take a public stance in order to make the facts on the ground highly legible to nontechnical decision-makers. (Nonexpert signatures, of course, help to show a preponderance of public support for AI regulation.) For those on the fence, Ishual has written an FAQ responding to common reasons not to sign.
In addition to signing, you can also write a statement of support and email it to letters@futureoflife.org. This statement can give more information on your agreement with the FLI statement. I think this is a good thing to do; it gives readers a lot more evidence about what signatures mean. It needs to be under 600 characters.
For examples of what ot...
It is the near future, and AI companies are developing distinct styles based on how they train their AIs. The philosophy of the company determines the way the AIs are trained, which determines what they optimize for, which attracts a specific kind of person and continues feeding in on itself.
There is a sports & fitness company, Coach, which sells fitness watches with an AI coach inside them. The coach reminds them to make healthy choices of all kinds, depending on what they've opted in for. The AI is trained on health outcomes based on the smartwatch d...
Today's Inkhaven post is an edit to yesterday's, adding more examples of legitimacy-making characteristics, so I'm posting it in shortform so that I can link it separately:
Here are some potential legitimacy-relevant characteristics: