Wow, thank you for sharing that. I can definitely relate. There is something genuinely beautiful about beginner mistakes when learning a new language. They build confidence through small failures and repeated repair. By avoiding them, it can feel like progress, like we’re making room for more sophisticated mistakes. But I worry we’re skipping the stage where confidence is built sustainably, especially in something that is genuinely hard for adults.
The inconsistency becomes the issue, right? This line suggests judgment - 'You came in with genuine curiosity and specific empirical claims, not "tell me my horoscope" vibes.' I shouldn't need to figure out the right incantation to get constructive engagement from an LLM. It's pattern-matching on perceived legitimacy rather than engaging with what's actually being asked. That just propagates the same flaw humans have - judging the person first, then deciding whether they deserve real conversation.
You could try switching to any sort of pseudo-scientific stuff (e.g. astrology) mid-conversation, and you will see this behaviour instantly. I appreciate it pushing back and making me reconsider my line of thinking, but what I find troubling is it playing dumb thereafter.
As an atypical applicant to MATS (no PhD, no coding/ technical skills, not early career, new to AI), I found it incredibly difficult to find mentors who were looking to hold space for just thinking about intelligence. I'd have loved to apply to a stream that involved just thinking, writing, being challenged and repeating until I'd a thesis worth pursuing. To me, it seemed more like most mentors were looking to test very specific hypothesis, and maybe it's for all the reasons you've stated above. But for someone new and inexperienced, I felt pretty unsure about applying at all.
I found this post after publishing something of my own yesterday, and it's wild how relevant this feels almost 2 decades later.
I'm not an expert on subjective probabilities, I come from analysing human behaviour and decision-making. What I find most fascinating is how you treat anticipation as a limited resource that has to be allocated among possible futures. In my world, people do something similar, but emotionally. We hoard permission the way the pundit hoards anticipation, waiting for perfect certainty before acting.
Recently, I watched someone spend months asking ChatGPT how to repair a friendship, crafting the perfect narrative, instead of just showing up. Therapists, astrologers and LLMs have all become proxies for "little numbers" that might make the risk of choosing feel safe. For so many of us.
I wonder if Bayesian reasoning is to belief what courage is to action? Because both are ways of updating before certainty arrives.
(If you're curious, my essay exploring this from the emotional side is here: https://shapelygal.substack.com/p/youre-afraid-to-choose-now-arent)
Aw, yeah it is easier to just look stuff up online and debate with LLMs, isn't it?
I am not a therapist, but I have been to therapists in multiple countries (US, UK and India) for several years, and I can share my understanding based on that experience.
I think human therapist accountability has multiple layers. Firstly, you need a professional license for practice that involves years of training, supervision, revocable licenses, etc. Then you have legal obligations for ensuring complete documentation and following crisis protocols. If these fail (and they sometimes do), you also have malpractice liability, and free market feedback. Even if only 1 in 100 bad therapists faces consequences, it creates deterrent effects across the profession. The system is imperfect but exists.
For AI systems, training, certification, supervision, documentation and crisis protocols are all doable, and probably far easier to scale, but at the end of the day, who is accountable for poor therapeutic advice? the model? the company building it? With normal adults, it's easy to ask for user discretion, but what do you do with vulnerable users? I am not sure how that would even work.
Thank you for this very detailed study.
I am most concerned about the accountability gap. Several students in my undergraduate class use these models as "someone to talk to" to deal with loneliness. While your study shows that some models handle vulnerable conversations better than others, I think the fundamental issue is that AI lacks the infrastructure for accountability that real therapeutic relationships require including continuity of care/ long-term mindset, professional oversight, integration with mental health systems, liability and negligence frameworks, etc.
Until then, I don't care how good the model is in terms of handling vulnerable conversations, I'd rather have it triage users by saying "Here are resources for professional support" and bow out, rather than attempting ongoing therapeutic relationships. Even perfectly trained therapeutic AI seems problematic without the broader accountability structures that protect vulnerable users.
More fundamentally, what are the underlying mechanisms that cause these model behaviours, and can training fixes address them without the accountability infrastructure?
are relationship coaches (not PUA) not a thing in the US?
Wait… isn’t this already filial piety? We created AI, and now we want it to mother us.
Thanks for reading! I'm especially interested in feedback from folks working on mechanistic interpretability or deception threat models. Does this framing feel complementary, orthogonal, or maybe just irrelevant to your current assumptions? Happy to be redirected if there are blind spots I’m missing.