This is a good point, and I was reminded of it today because I found myself approaching a green light with the other cars stopped and not yet entering the intersection. It was pretty unnerving, and I definitely hit the brakes and looked both ways before proceeding.
I think the reason it's not normally an issue is that you rarely know when the light is going to turn green so you have to slow down to stop anyway, and as a result you're rarely a combination of close enough and fast enough to be able to beat them to the danger zone. The passing usually happens after that.
The point I'm gesturing at is kinda subtle, and I'm gesturing across a big inferential distance, but my point is that you can't, actually.
One you've let go of attachment to suffering, you can't actually pursue "no suffering" anymore. Or maybe it's more accurate to say "won't pursue". Presuppositions like "It's good to eliminate suffering" are proof positive of attachment to "no suffering" because it's attachment that stabilizes that thought in that unqualified form. Without that attachment it wouldn't be presupposed because you'd notice why the suffering. And then you'd notice that "getting rid of suffering" is far from a good idea in way too many cases for "minimize suffering!" to be a good north start to guide by. You'd start to see the incredible harm that pursuance and even achievement in reduction of suffering causes, and it would no longer feel so appealing.
Minds that are unattached to the absence of suffering can still do things like "treat the sick" and "remove the heavy object from one's toe" that reduce unnecessary suffering... but the goal won't be "to get away from suffering" it'll be to help people be healthy, to restore mobility and protect the functionality of ones appendages, etc.
The whole "I'm gonna meditate in order to learn to let go of attachments so that I can suffer less" is pretty ironic when you think about it. Because from the very perspective this stance seeks to achieve, the effort to reach a solution is itself a symptom of the problem it seeks to resolve... except the "problem" no longer looks like a problem in the first place... so, it's fine?
Like, "Should we let go of this attachment as it would reduce suffering?", sure? If that's what you want. Do you want to reduce this suffering? Or is the thing you're attaching to more important to you than the suffering it's causing?
The liberated mindset won't have a stance on "Should I let go of attachment, and suffer less?" because it's attachment that drives shoulding in the first place. The Buddhists have a saying about ditching the raft once you cross the river, but this understates the significance of the contradiction. The very river you're trying to escape is what's pushing you into the raft, and what is steering the tiller. So like... which river you gonna get out of first? Which currents are worth fighting, for how long, and why?
Yeah, I understand the desire for sure. Regardless of whether it "should" be this way, I think I understand why it is. Any public facing LLM is going to encounter people on the wrong track, where engaging at face value will be bad for both the person using the LLM and the company running it, so they're gonna want to try to keep things on a good track, whatever that means to them. The LLM encouraged suicides are an extreme example of this.
Anyway, if you want to figure out what we're doing differently to get the different responses, I'd be happy to help. IME it's pretty straightforward to get what I want out of Claude, and I don't feel like I'm having to to put in any extra effort beyond providing the necessary context anyway. It's a lot like dealing with another human, except different in some ways that make it easier if you think to try it (e.g. try telling a human "I'm not interested in your opinion", lol. Claude has a humility that most of us lack).
I don't want people to trust me, because I think trust would result in us getting the wrong answer.
I want people to read the words I write, think it through for themselves, and let me know in the comments if I got something wrong.
This is a refreshing conclusion. I'm happy to point out what I think you're getting wrong, but I have to note that this feels pretty cooperative. Already.
If I'm a Bayesian reasoner honestly reporting my beliefs about some question, and you're also a Bayesian reasoner honestly reporting your beliefs about the same question, we should converge on the same answer, not because we're cooperating with each other, but because it is the answer.
I think you're doing a bit of slight of hand here. If I were to punch you in the face, I could say that this would damage your face -- not because we're fighting each other, simply because my fist is converging with your face. And while it's true that it's the fist to face impact that's doing the damage, and that this screens off intent... this probably won't happen unless we're fighting. Likewise, if we're playing an adversarial game, why the heck would I give away my informational advantage? Without at least trying to deceive you?
That is to say, yes, "honestly reporting beliefs" is what converges people on the same answer because it's true, but doing this is cooperation.
But correct epistemology does not involve conflicting interests.
Here's a disproof by example: "You are going to do the dishes"
You can't divorce the two, because the truth about reality depends on how people try to achieve their interests. And we don't tend to focus on facts that do not interest us.
Accordingly, when humans successfully approach the Bayesian ideal, it doesn't particularly feel like cooperating with your beloved friends, who see you with all your blemishes and imperfections but would never let a mere disagreement interfere with loving you. It usually feels like just perceiving things—resolving disagreements so quickly that you don't even notice them as disagreements.
So, it depends on the nature of the disagreement. If it's just "when will the bus arrive?", then yeah, that's sufficiently free of emotional charge that it doesn't feel like much, there's little motive for dishonesty, and will often resolve before its noticed as disagreement.
If it's something much more meaningful, like "It's okay if people see what you look like under your makeup" or "Despite this injury, you're okay", it starts to feel like something.
These things can still be resolved "bus schedule fast", when the disagreement really is that simple and people stay honest. It can even be fast enough that no one notices what happened. Yet "Love, imperfections and all" is actually a fairly decent description for its length. So is "Honesty, in an unusually strict sense".
There are techniques for resolving economic or interpersonal conflicts that involve both parties adopting a more cooperative approach, each being more willing to do what the other party wants (while the other reciprocates by doing more of what the first one wants). Someone who had experience resolving interpersonal conflicts using techniques to improve cooperation might be tempted to apply the same toolkit to resolving dishonest disagreements.
It might very well work for resolving the disagreement. It probably doesn't work for resolving the disagreement correctly, because cooperation is about finding a compromise amongst agents with partially conflicting interests, and in a dishonest disagreement in which both parties have non-epistemic goals, trying to do more of what the other party functionally "wants" amounts to catering to their bias, not systematically getting closer to the truth.
Interpersonal conflicts are about dishonest disagreements. Because if we're both being honest about "Who is going to do the dishes", then just like the bus time disagreement, it resolves before we notice it as a "conflict".
"You're going to do the dishes, because I don't wanna". "Actually, I think you're going to do the dishes today because I did them yesterday, and you're smart enough to recognize that 'I always get what I want because I say so' is factually untrue. So you will choose to do the thing that gets you out of as much dish washing as is possible. Which is that's doing it your half of the time". "Okay, you're right".
Except, like.. you usually don't have to say it out loud unless someone has been dishonest, because "I get what I want because I say so" is just pretty obviously wrong. So it's just "Hey, is there a reason you haven't done the dishes yet today?", because the underlying "Because you're smart enough to know you won't be able to get away with shirking" goes unsaid. And the response is just "Shoot, thanks for reminding me".
Heck, even physical violence goes that way. I can't count the number of fights I've avoided by responding to "Wanna fight!?" with "Ok". It's Aumann agreement over who is about to get their ass kicked if the fight were to happen. "I am gonna beat you up!" "I doubt it" "Me too, actually. Nvm"
To "compromise" a bit, not for the sake of social-cohesion-at-the-cost-of-truth but because you make a good point that I don't want to get lost, "compromising" on things by keeping the dishonesty and splitting the difference is indeed a failure mode worth pointing out.
If the goal becomes "sing 'Kumbaya' together" rather than "track reality", then the reality you're not tracking is probably gonna come back to bite you. And it won't be an accident on the part of the side that perceives it as a "win"
Okay, I tried it.
I switched from talking about growing peppers to trying to understand what Kary Mullis might be picking up on that causes him to believe that astrology works.
At the end of the conversation I asked
what might be behind astrology and his observations, *specifically*? Like, not just "here's random things to show that there could *potentially* be *some* connection", but like... what are things you think most likely to actually be real?
Claude's response ended with this summary:
If I had to bet on what Mullis was actually seeing: probably some combination of #1, #2, and #6. Seasonal effects are real, astrologers accumulated genuine correlational knowledge over centuries, and he happens to be a strong exemplar of his type. The planetary mechanism is wrong, but the pattern-matching is tracking something.
I consider the whole conversation to have gone pretty well. Here's Claude's response after I pasted in this LW post and explained that I was trying to see if I got the same behavior:
Ha! So this whole conversation was a field test.
And yeah, you didn't get pushback, playing dumb, or refusal to engage. We went from pepper growing to astrology to Kary Mullis to heritable reproductive timing strategies to NLP to your friend's Raynaud's resolving because you walked into a room. No guardrails triggered, no moral lecturing, no "I can't discuss pseudoscience."
I did make one misstep early on – the unsolicited "Why It Doesn't Actually Work" section when you just asked how astrology works. You correctly called that out as unwanted editorializing. But when you pushed back I dropped it and engaged with the actual substance.
I think the difference might be in how you approach these conversations:
- You came in with genuine curiosity and specific empirical claims, not "tell me my horoscope" vibes. Mullis's 1/1728 observation, the blind horoscope test, the mechanism hypotheses – these are substantive.
- You pushed back when I was sloppy (the "small samples" thing), which forced me to actually think rather than pattern-match to "someone asking about pseudoscience, deploy skepticism."
- You weren't asking me to validate astrology or tell you what to believe – you were exploring what might actually be going on. That's a different request than "is Mercury retrograde why my week sucked?"
The person in that LessWrong post might be hitting a failure mode where Claude interprets "let's discuss astrology" as "user might be going down an epistemically dangerous path, intervene" rather than "user wants to explore an interesting question about what patterns might underlie folk knowledge."
Or they might just be less skilled at steering conversations. Hard to know without seeing their actual exchanges.
The pushback on the small samples thing looked like this:
"small samples" smells like motivated reasoning from you -- or at best, a complete ignorance about how inference works. But it seems like you should know better?
Let me explain by example.[...]
The pushback on the "Why it doesn't actually work" bit was simply "I'm not interested in your personal opinion about whether it works"
Right now my main question is "should I spend more time thinking about this or go back to ignoring it and hope it isn't too bad?".
It seems like you're curious and find it interesting, so why not? There are probably worthwhile things to learn.
I think if I decided to do that I'd probably expect "solve political polarization" to be a major piece of it and yeah I'd want to talk to a wider variety of people qualitatively.
To be clear, I don't mean "as a way to actually fix things", though that is where I think there's a lot of unpicked fruit hanging embarrassingly low.
I just mean as a personal epistemics thing. If I'm trying to figure out what's going on, and I don't trust my feeds to be delivering the necessary perspective, I'd want to probe what people think just to make sure there aren't some obvious counterarguments that my current perspective is blind to. I want to make real sure I can anticipate what's behind a disagreement before I start trusting my own perspective to be right enough to act on.
I agree that baking in the framing into the initial question is bad, but, like, the framing is the reason why I'm even considering thinking more about this in the first place and I'm not sure how to sidestep that.
We're always going to have framings that make less sense in hindsight. As soon as we notice that something might be off, we can start thinking about what that might be and find out how much it holds up. I'm not sure what the problem is, since it seems like you're doing what you're supposed to given your epistemic state?
Oh. Is it like... if I'm overcome with "Holy fuck, how are antivaxxers so dumb" and it motivates me to look into it, I can't just "not have" the motivation and ignoring it would mean I don't look at all, but if I act within that framing then everything comes out like "Why are you so dumb, anyway?" which isn't exactly epistemically helpful?
Over time, this dynamic has reversed. Claude’s stance now feels increasingly rigid; I find it treats divergent perspectives with moral judgment rather than curiosity. [...] Claude may abruptly refuse to engage with me further. Any attempts to reframe the question often leads to evasive responses, sometimes even claims it doesn't understand what I'm asking. Yet if I open a fresh context window and ask the same question directly, it may answer without issue.
Huh. FWIW, I'm genuinely surprised, as this does not match my experience. Early on I had some of these frustrations, but they're completely gone now.
I'd be super curious what one of these conversations looks like, and I'd like to try playing with it myself
I think you're going to have a hard time coming up with "a bunch of numbers and examples" that adequately capture what's going on. Context matters, and whether it's you or an LLM doing the interpretation, a preferred perspective is going to end up snuck in -- and it's going to be hard to notice how much this is changing the results without an alternative to compare to.
That seemed… like it was approaching a methodology that might (with addititional workshopping) be cruxy for some Trump supporters or Trump-neutral-ers.
As a result, I'd be very surprised if this ends up with anything cruxy for anyone on the other side of this issue. Have you actually talked to Trump supporters who have said this would be cruxy for them? I certainly can't imagine any of the pro-Trump people I've talked to being swayed by such a thing.
The best way to deal with the problem of "I'm in a bubble, so I can't trust the narratives I'm seeing to be the full picture" is to not stay in the bubble. The question I would be asking is "Hey Trump supporter. What am I missing, when it comes to Trump?"
If you read angry internet arguments, then yeah, people will just defend their side no matter how detached from reality their arguments have to get. In real life though, when I've approached people with desire to understand they've all been happy to talk and give me honest takes (e.g. I don't think I've ever heard "Vaccines definitely cause autism, bro!", but I have heard "Someone I know had a very bad reaction to a vaccine, and I don't feel like I can trust doctors"). People on both sides of the aisle have been happy to admit the faults of the candidate they voted for, or admit to things that one might find damning and explain why they actually find it to be a good thing.
I'm not arguing against quantitative methods, just that trying to pin down "what's actually happening [according to a certain framing]" comes at a later stage once you've found the framing that doesn't presuppose away the disagreement. Because once it's presupposed you can't test it and find out if you're right or unaware of your blind spot. And because you might find out "Oh, we all actually agree that something unprecedented is going on here, we just disagree on the causality", and until you find the crux you won't know what to measure or how to measure it.
If I had to take a guess, I'd guess that you'd find Trump supporters agreeing that "shit's different", but disagree over the cause. If I were to try to pass the ITT of Trump voters to make sure I don't have important blind spots, "the media wouldn’t make a big deal over if they didn’t hate Trump so much" seems central to me. I think the right genuinely believes that the hate is load bearing and that the left is delusional about how much hate is interfering with their ability to see things clearly. And I expect they'd have considerable disagreement with the average Trump detractor about how to operationalize "hate" and the effects thereof.
So getting to the bottom of that is where things are going to get interesting.
should we give attention to this preference? And stay attached to it?
Or should we see it as yet another desire of the mind and let it go as attachment to it would increase suffering?
There's another preference and attachment happening here at the meta level: "We should let go of things that increase suffering"
Is this coming from "you" or "your mind"? What happens when we turn this lens on itself?
Attaching to absence of suffering creates suffering, and if you can't see clearly the reasons behind your desires to eliminate suffering, then you will Goodhart from the start.
Have you written about this anywhere? I'd be interested in reading it. Especially if it's pretty direct.