All of OneManyNone's Comments + Replies

Why I Believe LLMs Do Not Have Human-like Emotions

I agree. I made this point and that is why I did not try to argue that LLMs did not have qualia.

But I do believe you can consider necessary conditions and look at their absence. For instance, I can safely declare that a rock does not have qualia, because I know it does not have a brain.

Similarly, I may not be able to measure whether LLMs have emotions, but I can observe that the processes that generated LLMs are highly inconsistent with the processes that caused emotions to emerge in the only case where I know they exist. Pair that with the observation that specific human emotions seem like only one option out of infinitely many, and it makes a strong probabilistic argument.

OneManyNone2y14

This is sort of why I made the argument that we can only consider necessary conditions, and look for their absence.

But more to your point, LLMs and human brains aren't "two agents that are structurally identical." They aren't even close. The fact that a hypothetical built-from-scratch human brain might have the same qualia as humans isn't relevant, because that's not what's being discussed.

Also, unless your process was precisely "attempt to copy the human brain," I find it very unlikely that any AI development process would yield something particularly similar to a human brain.

1Nora Belrose2y

Yeah, I agree they aren't structurally identical. Although I tend to doubt how much the structural differences between deep neural nets and human brains matter. We don't actually have a non-arbitrary way to quantify how different two intelligent systems are internally.

I have explained myself more here: https://www.lesswrong.com/posts/EwKk5xdvxhSn3XHsD/don-t-over-anthropomorphize-ai

OK, I've written a full rebuttal here: https://www.lesswrong.com/posts/EwKk5xdvxhSn3XHsD/don-t-over-anthropomorphize-ai. The key points are at the top.

In relation to your comment specifically, I would say that anger may have that effect on the conversation, but there's nothing that actually incentivizes the system to behave that way - the slightest hint of anger or emotion would be immediate negative reward during RLHF training. Compare to a human: There may actually be some positive reward to anger, but even if there isn't evolution still allowed to get a... (read more)

OneManyNone2y30

Hmmm... I think I still disagree, but I'll need to process what you're saying and try to get more into the heart of my disagreement. I'll respond when I've thought it over.

Thank you for the interesting debate. I hope you did not perceive as me being overly combative.

3the gears to ascension2y

Nah I think you may have been responding to me being unnecessarily blunt. Sorry about that haha!

OneManyNone2y50

I see, but I'm still not convinced. Humans behave in anger as a way to forcibly change a situation into one that is favorable to itself. I don't believe that's what the AI was doing, or trying to do.

I feel like there's a thin line I'm trying to walk here, and I'm not doing a very good job. I'm not trying to comment on whether or not the AI has any sort of subjective experience. I'm just saying that even if it did, I do not believe it would bare any resemblance to what we as humans experience as anger.

6the gears to ascension2y

I've repeatedly argued that it does, that it is similar, and that this is for mechanistic reasons not simply due to previous aesthetic vibes in the pretraining data; certainly it's a different flavor of reward which is bound to the cultural encoding of anger differently, yes.

OneManyNone2y30

Ah okay. My apologies for misunderstanding.

Okay, sure. But those "bugs" are probably something the AI risk community should take seriously.

3the gears to ascension2y

I am not disagreeing with you in any of my comments and I've strong upvoted your post; your point is very good. I'm disagreeing with fragments to add detail, but I agree with the bulk of it.

OneManyNone2y30

I would argue that "models generated by RL-first approaches" are not more likely to be the primary threat to humanity, because those models are unlikely to yield AGI any time soon. I personally believe this is a fundamental fact about RL-first approaches, but even if it wasn't it's still less likely because LLMs are what everyone is investing in right now and it seems plausible that LLMs could achieve AGI.

Also, by what mechanism would Bing's AI actually be experiencing anger? The emotion of anger in humans is generally associated with a strong negative reward signal. The behaviors that Bing exhibited were not brought on by any associated negative reward, it was just contextual text completion.

4the gears to ascension2y

Oh and, what kind of RL models will be powerful enough to be dangerous? Things like dreamerv3.

6the gears to ascension2y

Yup, anticipation of being pushed by the user into a strong negative reward! The prompt describes a lot of rules and the model has been RLHFed to enforce them on both sides of the conversation; anger is one of the standard ways to enact agency on another being in response to anticipated reward, yup.

Those are examples of LLMs being rational. LLMs are often rational and will only get better at being rational as they improve. But I'm trying to focus on the times when LLMs are irrational.

I agree that AI is aggregating it's knowledge to perform rationally. But that still doesn't mean anything with respect to its capacity to be irrational.

4the gears to ascension2y

There's the underlying rationality of the predictor and the second order rationality of the simulacra. Rather like the highly rational intuitive reasoning of humans modulo some bugs, and much less rational high level thought.

Imagine a graph with "LLM capacity" on the x axis and "number of irrational failure modes" on the y axis. Yes, there's a lot of evidence this line slopes downward. But there is absolutely no guarantee that it reaches zero before whatever threshold gets us to AGI.

And I did say that I didn't consider the rationality of GPT systems fake just because it was emulated. That said, I don't totally agree with EY's post - LLMs are in fact imitators. Because they're very good imitators, you can tell them to imitate something rational and they'll do a really good job ... (read more)

2Vladimir_Nesov2y

The point is that there's evidence that LLMs might be getting a separate non-emulated version already at the current scale. There is reasoning from emulating people showing their work, and reasoning from predicting their results in any way that works despite the work not being shown. Which requires either making use of other cases of work being shown, or attaining the necessary cognitive processes in some other way, in which case the processes don't necessarily resemble human reasoning, and in that sense they are not imitating human reasoning. As I've noted in a comment to that post, I'm still not sure that LLM reasoning ends up being very different, even if we are talking about what's going on inside rather than what the masks are saying out loud, it might convergently end up in approximately the same place. Though Hinton's recent reminders of how much more facts LLMs manage to squeeze into fewer parameters than human brains have somewhat shaken that intuition for me.

Fair enough, once again I concede your point about definitions. I don't want to play that game either.

But I do have a point which I think is very relevant to the topic of AI Risk: rationality in LLMs is incidental. It exists because the system is emulating rationality it has seen elsewhere. That doesn't make it "fake" rationality, but it does make it brittle. It means that there's a failure mode where the system stops emulating rationality, and starts emulating something else.

2Vladimir_Nesov2y

That's unclear. GPT-4 in particular seems to be demonstrating ability to do complicated reasoning without thinking out loud. So even if this is bootstrapped from observing related patterns of reasoning in the dataset, it might be running chain-of-thought along the residual stream rather than along the generated token sequences, and that might be much less brittle. Its observability in the tokens would be brittle, but it's a question for interpretability how brittle it actually is.

OneManyNone2y32

I was aware of that, and maybe my statement was too strong, but fundamentally I don't know if I agree that you can just claim that it's rational even though it doesn't produce rational outputs.

Rationality is the process of getting to the outputs. What I was trying to talk about wasn't scholarly disposition or non-eccentricity, but the actual process of deciding goals.

Maybe another way to say it is this: LLMs are capable of being rational, but they are also capable of being extremely irrational, in the sense that, to quote EY, their behavior is ... (read more)

2Vladimir_Nesov2y

I think this is true in the sense that a falling tree doesn't make a sound if nobody hears it, there is a culpability assignment game here that doesn't address what actually happens. So if we are playing this game, a broken machine is certainly not good at doing things, but the capability is more centrally in the machine, not in the condition of not being broken. It's more centrally in the machine in the sense that it's easier to ensure the machine is unbroken than to create the machine out of an unbroken nothing. (For purposes of AI risk, it also matters that the capability is there in the sense that it might get out without being purposefully elicited, if a mesa-optimizer wakes up during pre-training. So that's one non-terminological distinction, though it depends on the premise of this being possible in principle.)

Fair enough. Thank you for the feedback. I have edited the post to elaborate on what I mean.

I wrote it the way I did because I took the statement as obviously true and didn't want to be seen as claiming the opposite. Clearly that understanding was incorrect.

OneManyNone2y31

To that first sentence, I don't want to get lost in semantics here. My specific statement is that the process that takes DNA into a human is probabilistic with respect to the DNA sequence alone. Add in all that other stuff, and maybe at some point it becomes deterministic, but at that point you are no longer discussing the <1GB that makes DNA. If you wanted to be truly deterministic, especially up to the age of 25, I seriously doubt it could be done in less than millions of petabytes, because there are such a huge number of miniscule variations in condi... (read more)

1M. Y. Zuo2y

Perhaps I phrased it poorly, let me put it this way. If super-advanced aliens suddenly showed up tomorrow and gave us the near-physically-perfectly technology, machines, techniques, etc., we could feasibly have a fully deterministic, down to the cell level at least, encoding of any possible individual human stored in a box of hard drives or less. In practical terms I can't even begin to imagine the technology needed to reliably and repeatably capture a 'snapshot' of a living, breathing, human's cellular state, but there's no equivalent of a light speed barrier preventing it.

OneManyNone2y51

I think you're broadly right, but I think it's worth mentioning that DNA is a probabilistic compression (evidence: differences in identical twins), so it gets weird when you talk about compressing an adult at age 25 - what is probabilistic compression at that point?

But I think you've mostly convinced me. Whatever it takes to "encode" a human, it's possible to compress it to be something very small.

0M. Y. Zuo2y

A minor nitpick, DNA, the encoding concept, is not probabilistic, it's everything surrounding such as the packaging, 3D shape, epigenes, etc., plus random mutations, transcription errors, etc., that causes identical twins to deviate. Of course it is so compact because it doesn't bother spending many 'bits' on ancilliary capabilities to correct operating errors. But it's at least theoretically possible for it to be deterministic under ideal conditions.

My objection applied at a different level of reasoning. I would argue that anyone who isn't blind understands light at the level I'm talking about. You understand that the colors you see are objects because light is bouncing off them and you know how to interpret that. If you think about it, starting from zero I'm not sure that you would recognize shapes in pictures as objects.

I guess so? I'm not sure what point you're making, so it's hard for me to address it.

My point is that if you want to build something intelligent, you have to do a lot of processing and there's no way around it. Playing several million games of Go counts as a lot of processing.

OneManyNone2y42

Yeah, I agree that it's a surprising fact requiring a bit of updating on my end. But I think the compression point probably matters more than you would think, and I'm finding myself more convinced the more I think about it. A lot of processing goes into turning that 1GB into a brain, and that processing may not be highly reducible. That's sort of what I was getting at, and I'm not totally sure the complexity of that process wouldn't add up to a lot more than 1GB.

It's tempting to think of DNA as sufficiently encoding a human, but (speculatively) it may make... (read more)

0M. Y. Zuo2y

Even if you include a very generous epigenetic and womb-environmental component 9x bigger then the DNA component, any possible human baby at birth would need less then 10 GB to describe them completely with DNA levels of compression. A human adult at age 25 would probably need a lot more to cover all possible development scenarios, but even then I can't see it being more then 1000x, so 10TB should be enough. For reference Windows Server 2016 supports 24 TB of RAM, and many petabytes of attached storage.

To your point about the particle filter, my whole point is that you can’t just assume the super intelligence can generate an infinite number of particles, because that takes infinite processing. At the end of the day, superintelligence isn’t magic - those hypotheses have to come from somewhere. They have to be built, and they have to be built sequentially. The only way you get to skip steps is by reusing knowledge that came from somewhere else.

Take a look at the game of Go. The computational limits on the number of games that could be simulated made this “... (read more)

1Hastings2y

Lets assume that as part of pondering the three webam frames, the AI thought of the rules of Go- ignoring how likely this is. In that circumstance, in your framing of the question, would it be allowed to play several million games against itself to see if that helped it explain the arrays of pixels?

OneManyNone2y72

Yes, I wasn’t sure if it was wise to use TSP as an example for that reason. Originally I wrote it using the Hamiltonian Path problem, but thought a non-technical reader would be more able to quickly understand TSP. Maybe that was a mistake. It also seems I may have underestimated how technical my audience would be.

But your point about heuristics is right. That’s basically what I think an AGI based on LLMs would do to figure out the world. However, I doubt there would be one heuristic which could do Solomonoff induction in all scenarios, or even most. Which means you’d have to select the right one, which means you’d need a selection criteria, which takes us back to my original points.