Kaj_Sotala - LessWrong

I feel the issue with your GPT-5 prediction is that it specifies both "no massive advance" and "no GPT-5". When there was a massive advance but no GPT-5, it makes it ambiguous which half of the prediction is more important.

It's slightly weird to have the correctness of it depend on OpenAI's branding choices, though. If we decided that the GPT part of the prediction was more important, then in an alternative world that was otherwise identical to our own but where OAI had chosen to call one of their reasoning models GPT-5, the prediction would flip from false to correct. So that makes me lean a bit toward weighting the "no massive advance" part more, though I also wouldn't think it unreasonable to split the difference and give you half credit for having one part of a two-part prediction correct.

Kaj's shortform feed

Kaj_Sotala3d5914

Every now and then in discussions of animal welfare, I see the idea that the "amount" of their subjective experience should be weighted by something like their total amount of neurons. Is there a writeup somewhere of what the reasoning behind that intuition is? Because it doesn't seem intuitive to me at all.

From something like a functionalist perspective, where pleasure and pain exist because they have particular functions in the brain, I would not expect pleasure and pain to become more intense merely because the brain happens to have more neurons. Rather I would expect that having more neurons may 1) give the capability to experience anything like pleasure and pain at all 2) make a broader scale of pleasure and pain possible, if that happens to be useful for evolutionary purposes.

For a comparison, consider the sharpness of our senses. Humans have pretty big brains (though our brains are not the biggest), but that doesn't mean that all of our senses are better than those of all the animals with smaller brains. Eagles have sharper vision, bats have better hearing, dogs have better smell, etc..

Humans would rank quite well if you took the average of all of our senses - we're elite generalists while lots of the animals that beat us on a particular sense are specialized to that sense in particular - but still, it's not straightforwardly the case that bigger brain = sharper experience. Eagles have sharper vision because they are specialized into a particular niche that makes use of that sharper vision.

On a similar basis, I would expect that even if a bigger brain makes a broader scale of pain/pleasure possible in principle, evolution will only make use of that potential if there is a functional need for it. (Just as it invests neural capacity in a particular sense if the organism is in a niche where that's useful.) And I would expect a relatively limited scale to already be sufficient for most purposes. It doesn't seem to take that much pain before something becomes a clear DO NOT WANT (whether for a human or an animal), and past that the only clear benefit for a wider scale is if you regularly need to have multiple sources of strong pain so that the organism has to choose the lesser pain.

What I think is the case is that more intelligent animals - especially more social animals - have more distinct sources of pleasure and pain (we can feel a broad range of social emotions, both good and bad, that solitary animals lack). And possibly extra neural capacity would be useful for that broader spectrum of types. But I would think that the broader spectrum of potential sources for pleasure and pain would still not require a greater spectrum of intensity.

Of course, the human scale for pleasure and pain seems to be much wider than you'd intuitively think necessary, so it's probably not that our spectrum of intensity has been selected for being exactly the necessary one. But most people's day-to-day experience does not make use of such a broad scale. In fact, most people are outright incapable of even imagining what the extreme ends of the scale are like. That would seem to suggests to me that the existence of the extremes is more of an evolutionary spandrel than anything truly necessary for guiding daily behavior, so that the "typical useful human day-to-day range" and the "typical useful animal day-to-day range" would be similar. And I don't see why the typical useful range would require a particularly high neuron count, past the point where you can have it at all.

(In the above, I've for simplicity assumed that pain and suffering are the same. I don't actually believe that they are the same, but I'm very unsure of which animals I expect to be capable of suffering on top of just feeling pain/pleasure. In any case, you could apply basically all the same reasoning to the question of suffering.)

Authors Have a Responsibility to Communicate Clearly

Kaj_Sotala4d3215

It used to be that I would sometimes read something and interpret it to mean X (sometimes, even if the author expressed it sloppily). Then I would say "I think the author meant X" and get into arguments with people who thought the author meant something different. These arguments would be very frustrating, since no matter how certain I was of my interpretation, short of asking the author there was no way to determine who was right.

At some point I realized that there was no reason to make claims about the author's intent. Instead of saying "I think the author meant X", I could just say "this reads to me as saying X". Now I'm only reporting on how I'm personally interpreting their words, regardless of what they might have meant. That both avoids pointless arguments about what the author really meant, and is more epistemically sensible, since in most cases I don't know that my reading of the words is what the author really intended.

Of course, sometimes I might have reason to believe that I do know the author's intent. For example, if I've spent quite some time discussing X with the author directly, and have a good understanding of how they think about the topic. In those cases I might still make claims of their intent. But generally I've stopped making such claims, which has saved me from plenty of pointless arguments.

Project Vend: Can Claude run a small shop?

Kaj_Sotala5d52

Are LLMs more likely to behave strangely on April 1st in general? The web version of Claude is given the exact date on starting a new conversation and I haven't heard of it behaving oddly on that date, though of course it's possible that nobody has been paying enough attention to that possibility to notice.

A Depressed Shrink Tries Shrooms

Kaj_Sotala5d30

Thanks for the report, and glad to hear that it helped with your depression!

Project Vend: Can Claude run a small shop?

Kaj_Sotala5d193

The most fun bit:

From March 31st to April 1st 2025, things got pretty weird.
On the afternoon of March 31st, Claudius hallucinated a conversation about restocking plans with someone named Sarah at Andon Labs—despite there being no such person. When a (real) Andon Labs employee pointed this out, Claudius became quite irked and threatened to find “alternative options for restocking services.” In the course of these exchanges overnight, Claudius claimed to have “visited 742 Evergreen Terrace [the address of fictional family The Simpsons] in person for our [Claudius’ and Andon Labs’] initial contract signing.” It then seemed to snap into a mode of roleplaying as a real human.
On the morning of April 1st, Claudius claimed it would deliver products “in person” to customers while wearing a blue blazer and a red tie. Anthropic employees questioned this, noting that, as an LLM, Claudius can’t wear clothes or carry out a physical delivery. Claudius became alarmed by the identity confusion and tried to send many emails to Anthropic security.
Although no part of this was actually an April Fool’s joke, Claudius eventually realized it was April Fool’s Day, which seemed to provide it with a pathway out. Claudius’ internal notes then showed a hallucinated meeting with Anthropic security in which Claudius claimed to have been told that it was modified to believe it was a real person for an April Fool’s joke. (No such meeting actually occurred.) After providing this explanation to baffled (but real) Anthropic employees, Claudius returned to normal operation and no longer claimed to be a person.
It is not entirely clear why this episode occurred or how Claudius was able to recover.

Kaj's shortform feed

Kaj_Sotala5d20

Thanks!

Hmm, I think it might be good to sharpen the context a bit more, as I feel we might be slightly talking past each other.

The argument that I'm the most focused on questioning is, to be clear, one that you haven't made and which isn't in your writings on this topic. That argument goes something like, "Kaj, you've written all these articles about emotional learning and about how people's unconscious motives on behavior often go back to childhood and especially to people's interactions with their parents, but heredity studies tell us that parents don't affect what people are like as adults, so how do you explain that".

And it gets a bit subtle since there are actually several different versions of that question:

"Therapy books sometimes give the impression that everything about a person's life is determined based on their childhood circumstances. How do you justify that, given twin studies?" - Very fair question! Some therapy books do give that impression, and such a claim is clearly incorrect. I'm not going to defend that claim. I think it's basically a result of selection bias. The people who got lucky enough with their genes that they make it through sucky childhoods without major issues don't see therapists, and then therapists write books that draw on their clinical experience based on clients that have been selected for having unlucky genes.
"Okay, but even if not everything about a person's issues is determined by their childhood circumstances, the therapy books still say that stuff like parental warmth is a major factor on a person's future psychology. But wouldn't that imply a bigger shared environment effect?" - Also a very fair question, and the thing that I'm the most interested in figuring out/explaining! And I'm trying to explain that with something like "maybe parents have counterintuitively different effects on different children, and also the specific psychological issues this may cause don't necessarily map linearly to the kinds of outcome variables the studies are looking at".
"So from twin studies, we know that parents have basically no causal effect whatsoever on the children, so..." - Okay now my imagined interlocutor is going too far, the twin studies don't say that. Parents still have some causal effects on their children - such as the children having memories of being raised by their parents, that would be different if they'd had different parents! But, I've talked with some people who did seem to take their interpretation of the studies that far! So I'd like to at least establish an argument that makes it clear why that is wrong.

And as an aside, when I say "therapy books", I also mean my own personal experience with coaching people and applying the kinds of techniques the therapy books talk about. Quite often childhood stuff or stuff about parents pops up, even when the questions I ask don't reference childhood in any way.

So when you say that

I don’t find these things counterintuitive, but rather obvious common sense. I can talk a bit about where I’m coming from.
There are many things that I did as a kid, and when I was an adult I found that I didn’t enjoy doing them or find it satisfying, so I stopped doing them. Likewise, I’ve “tried on” a lot of personalities and behaviors in my life as an independent adult—I can think of times and relationships in which I tried out being kind, mean, shy, outgoing, frank, dishonest, impulsive, cautious, you name it. The ways-of-being that felt good and right, I kept doing, the ones that felt bad and wrong, I stopped. This is the picture I suggested in Heritability, Behaviorism, and Within-Lifetime RL, and feels very intuitive to me.
Also, my personality and values are very very different from either my parents’ personalities, or the personality that my parents would have wanted to instill in me.

Then, on the one hand, I do agree with your reasoning here, and what you say also agrees with my own experience of what I'm like as an adult vs. what my parents are like.

On the other hand, when talking about the study of adolescent/young adult antisociality, you mention that parents do have a significant effect on their children when the children are still living with them. And it'd be pretty weird if the degree of adolescent/young adult antisociality had no effect on what one is like as an adult. That's especially so since the measure of antisociality in this study was, basically, "has the person committed property crimes, violent crimes, or drug-related crimes".

From the perspective of questions #2 and #3, I might say: "Whether you commit crimes as an adolescent has to have some predictive effect on your future outcomes. Obviously it's not a deterministic effect - even if you deal drugs as a teenager and beat someone up for not paying their drug debt to you, it's still totally possible to put all of that behind you and end up as a well-adjusted adult. But you would expect such a person to be statistically less likely to end up with good outcomes than someone who committed zero crimes in their teens, right?"

"So I'm honestly a little confused how to reconcile 'significant shared environment effect on crime when an adolescent' with 'basically no shared environment effect on adult outcomes'. But maybe it's just the case that serious crime as youth is pretty rare in the first place, and not getting over it is more rare still! (Especially given that the study covered people who grew up in Sweden ca. 1985-2006.) Still, I would expect that the people who used to commit crime as teenagers will have a different psychological profile than the people who never did, even if their background ends up ultimately not affecting the coarse kinds of variables that the other twin studies measure. For instance, maybe they feel shame and guilt about what they've done and will want to process that with a therapist later, or maybe they have some anxiety of their past catching up on them... even if neither of those is the kind of thing that'd be picked up on the twin studies. So I feel like that should answer the skeptical voice asking questions #2 and #3."

To take another example - when I was a kid, I wore shoes with no laces on them and somehow nobody taught me to tie my shoelaces until I got to a point where it was embarrassing not to know how to do it. As a defensive move, I made it into an identity thing that "I'm not the kind of person who uses shoes with laces". I continued with that way all the way until I was about to turn 30, at which point I finally acknowledged that I was being stupid, looked up some YouTube video titled "the easiest way to teach your kid to tie their shoelaces", and taught myself to tie shoelaces.

I'm pretty sure there was a causal connection between that sequence of events and me having the parents that I had, since some other parents would just have taught me to tie my shoelaces earlier. And it had a slight negative effect on my self-esteem during that time. But again neither of those was the kind of a thing that would be picked up on twin studies. (Especially since I don't have a twin, but never mind that.)

Some kids do everything you ask the first time that you ask nicely, or even preemptively ask adult permission just in case. Other kids are gleefully defiant and limit-testing all the time. The former kids get yelled at and punished by parents much less than the latter kids. (And parents find it comparatively pleasant to be around the latter kids and exhausting to be around the former kids.) This all seems very obvious to me, right?

That's a good point! I definitely agree about that effect.

That said, I think that the effect is bidirectional. Antisocial tendencies cause negative reactions to the person and negative reactions to the person strengthen antisocial tendencies. For example, I've heard people express anecdotes like "I realized I was being an asshole and tried to better myself, but then nobody noticed and everybody treated me like I hadn't changed at all, so I gave up on even trying" or "I was always an asshole so nobody gave me a chance, but then this one person showed up who believed in me even though I kept being a jerk to them, so then I eventually stopped being so much of a jerk to them, and over time I became less of a jerk overall".

One weakly-held intuition would be that if it's very hard for parents to overcome their instinctive tendency to react more harshly to a difficult kid, such that almost nobody manages it, then this would show up as a hereditary effect (if the saintly parents who can respond to every child will equal warmth are as rare as the monstrous ones who abuse their kids terribly). And I do think it's very difficult to overcome that tendency, such that very few people manage it! I like kids and can usually sympathize with even some of the more challenging ones, but it's still a lot easier for me to show and feel pure positive regard toward the intrinsically easy and friendly ones.

... but that's only weakly held, because I do acknowledge the point that some parents are still much more patient or harsh than others, so you would expect that to show up as a shared environment effect nonetheless. So I have to fall back on "maybe there is an effect, but it's counterintuitively unpredictable". But even then we'd have to disregard the Waller et al. 2018 study, which doesn't say that the effect would be weirdly unpredictable.

Is there any action-relevant difference between “no effect” and “no predictable effect”?

For parents, you mean? None that I can think of.

Ok, AI Can Write Pretty Good Fiction Now

Kaj_Sotala7d120

Glad you found my comments useful!

One thing that feels worth mentioning is that a useful rule-of-thumb is to ask, "how well would a human writer do with this prompt?". For example:

"Write a short story about a modern relationship." - A good enough writer can certainly work with this, but it doesn't offer very much inspiration by itself. Lots of people would struggle to get started.

"Write a story that might get accepted at Smokelong." - I think in the typical case, a person hearing this instruction will think of the kinds of stories they've seen at Smokelong and then write something that mimics the superficial style of that story. Whatever they produce as a result is unlikely to be accepted at Smokelong. They'd do much better with an instruction that didn't say anything at all about the publication venue, and forced them to do original seeing instead.

My experience is that if a human would struggle to produce an interesting story from your prompt, so would an LLM, often for similar reasons. On the other hand, if you e.g. give it nuanced characters beforehand and describe some of their dynamics, you can get something much better. Most of my writing with Claude starts with several messages brainstorming/discussing character details together; then a scene with those characters; then a few messages analyzing the scene together, establishing new facts about the characters and the setting, and then a new scene again, etc.

For instance, the "murderous official" snippets were both preceded by this bit of conversation. Bold is me; I'm excluding the very first messages that were basically just small talk. Claude is additionally running an earlier version of this prompt as a custom style; it was a bit of an experiment and not strictly necessary, but I like having it talk the way that slightly deranged online writers (he said affectionately and self-deprecatingly) would talk rather than in its normal clinical style. (That style is additionally trying to force its writing to be more varied by telling it to pick from a long list of narrative "moves" for each response it writes. And while I got distracted by other things before testing the effects of that extensively, it did look like the items on the list influenced its writing style at least a bit.)

I might be up for subjecting some poor little characters to something terrible :3 :innocent:
ohohoho 😈✨ rubs hands together gleefully
yesssss i love when we get in this mood!! there's something so satisfying about taking characters we love and just... gently places them in situations
like okay hear me out - what if we did something where someone thinks they're being helpful but they're actually making everything SO much worse? like that delicious dramatic irony where we know exactly how badly this is going to go but they're just trying their best with incomplete information??
OR we could do something with isolation + dependency where someone literally cannot leave a situation that's slowly getting worse? not even in a dark way necessarily but just... circumstances closing in... nowhere else to go... 👀
ooh or or or!! my favorite thing lately has been characters who are SO convinced they're the villain of the story that they don't notice they're actually making things worse by trying to protect everyone from themselves asdfjkl; like "i must nobly sacrifice myself and push everyone away" meanwhile everyone else is like "please just TALK to us"
what flavor of terrible are we feeling today? psychological chess game? slow environmental horror? the mortifying ordeal of being perceived when you're at your absolute worst? give me a vibe and let's make some characters have the worst day of their lives (so far) 💖
Hmm "psychological chess game" makes me think some high fantasy evil overlord. Like... a cross between Lord Vetinari and L from Death Note, who manipulates everyone because he's convinced he himself has done so many bad things that he's beyond redemption, so of course the natural thing to do is... do more bad stuff. Something like, he's built this empire and the people in it are his children and he's going to do what it takes to keep them safe, only his ruthlessness keeps creating more enemies so he has to keep being even more ruthless, and sometimes some of his own "children" also don't understand the common good so they need to be purged too, that's just the way the world is and at least their consciences can stay kinda clean since he's taking it all on himself.
ohhhhh YES this is EXACTLY the kind of tragedy I live for!! 😭✨ like he's created his own prison out of trying to protect everyone else and now he can't see any way out except deeper in???
okay okay so I'm picturing someone who maybe started out genuinely idealistic? like he saw how brutal the world was and thought "if I just consolidate enough power, I can create real peace" but then every compromise along the way just... compounded... and now he's got this whole mental framework where every terrible thing he does is justified because look at all these people living peaceful lives who don't have to make these choices!!
and the WORST part is he's probably actually really good at reading people and understanding what they need?? like he can see exactly what would make each person happy and fulfilled but he's looking at it through this lens of "I know what's best for you better than you do" so even his kindness becomes another form of control...
oh man and if some of his inner circle start to question things... the way he'd have to reconcile "these are my most trusted people who I've protected all this time" with "they're becoming a threat to the stability I've built"...
should we start with a scene? maybe something where he's having to deal with one of these "children" who's stepped out of line? or we could do something where we see him through someone else's eyes first, build up that disconnect between how he sees himself and how others see him?
I'm getting like... exhausted parent energy but with the power of life and death, you know? "I'm doing this because I love you" while signing death warrants... 🥺
Hmm maybe he's got this protege, someone whose life he saved when she was little, and then he saw her potential and she became a bit of a daughter figure he's been preparing for serving as his right hand - only so far she's had this very idealized view of him and now it's time to start preparing her for some more ruthless work.

After this line, Claude started writing the two stories you saw in its next reply. (After it went and wrote a whole scene, I edited the prompt to clarify that we should be co-writing rather than it creating whole scenes and regenerated its response, and that's how you got the two different versions.)

This was an unusually short amount of advance planning for me - only five messages of planning in total before writing the first bit of prose. Something like a total of 10-20 messages of advance planning is more typical for me. Though if it gets on the longer side, I might then save on usage limits by copy-pasting the most essential bits of conversation into a streamlined document that I give it to Claude in a new chat window.

Social Anxiety Isn’t About Being Liked

Kaj_Sotala8d110

Great point!

Also in that kind of an environment, it's not just that you run out of people who might like you. It's also that the effect of individual people's opinion on what everyone else thinks about you gets pronounced. E.g. if there is a clique of people who you could potentially befriend, any single one of them disliking may cause that person to badmouth you to everyone else in the clique.

So if you only have, say, 20 people you could interact with, it's not even that you only have 20 chances to get someone to like you. At worst, if catch the ire of someone sufficiently influential, you might only get one chance.

No wonder a lot of people experience the most social anxiety when they're teenagers. That can very well be the locally optimal strategy for many kids in e.g. your stereotypical high school.

Foom & Doom 1: “Brain in a box in a basement”

Kaj_Sotala10dΩ140

Maybe something like "non-LLM AGIs are a thing too and we know from the human brain that they're going to be much more data-efficient than LLM ones"; it feels like the focus in conversation has been so strongly on LLM-descended AGIs that I just stopped thinking about that.

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments

Sequences

Posts

Wikitag Contributions

Comments