Kaj_Sotala

Sequences

Why Everyone (Else) Is a Hypocrite: Evolution and the Modular Mind
Concept Safety
Multiagent Models of Mind
Keith Stanovich: What Intelligence Tests Miss

Wiki Contributions

Comments

Sorted by

Thanks, that's helpful. My impression from o1 is that it does something that could be called mental simulation for domains like math where the "simulation" can in fact be represented with just writing (or equations more specifically). But I think that writing is only an efficient format for mental simulation for a very small number of domains.

(Hmm I was expecting that this would get more upvotes. Too obvious? Not obvious enough?)

Hoping that we're more than a decade from transformative AGI now seems wildly optimistic to me. There could be dramatic roadblocks I haven't foreseen, but most of those would just push it past three years.

Self-driving cars seem like a useful reference point. Back when cars got unexpectedly good performance at the 2005 and 2007 DARPA grand challenges, there was a lot of hype about how self-driving cars were just around the corner now that they had demonstrated having the basic capability. 17 years later, we're only at this point (Wikipedia):

As of late 2024, no system has achieved full autonomy (SAE Level 5). In December 2020, Waymo was the first to offer rides in self-driving taxis to the public in limited geographic areas (SAE Level 4),[7] and as of April 2024 offers services in Arizona (Phoenix) and California (San Francisco and Los Angeles). [...] In July 2021, DeepRoute.ai started offering self-driving taxi rides in Shenzhen, China. Starting in February 2022, Cruise offered self-driving taxi service in San Francisco,[11] but suspended service in 2023. In 2021, Honda was the first manufacturer to sell an SAE Level 3 car,[12][13][14] followed by Mercedes-Benz in 2023.

And self-driving capability should be vastly easier than general intelligence. Like self-driving, transformative AI also requires reliable worst-case performance rather than just good average-case performance, and there's usually a surprising amount of detail involved that you need to sort out before you get to that point.

What could plausibly take us from now to AGI within 10 years?

A friend shared the following question on Facebook:

So, I've seen multiple articles recently by people who seem well-informed that claim that AGI (artificial general intelligence, aka software that can actually think and is creative) in less than 10 years, and I find that baffling, and am wondering if there's anything I'm missing.  Sure, modern AI like ChatGPT are impressive - they can do utterly amazing search engine-like things, but they aren't creative at all.  

The clearest example of this I've seen comes from people's experiences with AI writing code.  From what I've read, AI can do exceptionally well with this task, but only if there are examples of the needed sort of code online that it can access or was trained on, and if it lacks this, it's accuracy is quite bad with easy problems and essentially non-existent with problems that are at all difficult.  This clearly says to me that current AI are glorified very impressive search engines, and that's nowhere near what I'd consider AGI and doesn't look like it could become AGI.

Am I missing something?

I replied with some of my thoughts as follows:

I have also been a little confused by the shortness of some of the AGI timelines that people have been proposing, and I agree that there are types of creativity that they're missing, but saying that they're not creative at all sounds too strong. I've been using Claude as a co-writer partner for some fiction and it has felt creative to me. Also e.g. the example of this conversation that someone had with it.

In 2017 I did a small literature review on human expertise, which to me suggested that expertise can broadly be divided into two interacting components: pattern recognition and mental simulation. Pattern recognition is what current LLMs do, essentially. Mental simulation is the bit that they're missing - if a human programmer is facing a novel programming challenge, they can attack it from first principles and simulate the program execution in their head to see what needs to be done.

The big question would then be something like "how hard would it be to add mental simulation to LLMs". Some indications that it wouldn't necessarily be that hard:

* In humans, while they are distinct capabilities, the two also seem to be intertwined. If I'm writing a social media comment and I try to mentally simulate how it will be received, I can do it because I have a rich library of patterns about how different kinds of comments will be received by different readers. If write something that triggers a pattern-detector that goes "uh-oh, that wouldn't be received well", I can rewrite it until it passes my mental simulation. That suggests that there would be a natural connection between the two.
* There are indications that current LLMs may already be doing something like internal simulation though not being that great at it. Like in the "mouse mastermind" vignette, it certainly intuitively feels like Claude has some kind of consistent internal model of what's going on. People have also e.g. trained LLMs to play games like Othello and found that the resulting network has an internal representation of the game board ( https://www.lesswrong.com/posts/nmxzr2zsjNtjaHh7x/actually-othello-gpt-has-a-linear-emergent-world ).
* There have also been various attempts at explicitly combining an LLM-based component with a component that does something like simulation. E.g. DeepMind trained a hybrid LLM-theorem prover system that reached silver medal-level performance on this year's International Mathematics Olympiad ( https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/ ), where the theorem prover component maintains a type of state over the math problem as it's being worked on.
* Iterative improvements like chain-of-thought reasoning are also taking LLMs in the direction of being able to apply more novel reasoning in domains such as math. Mathematician Terry Tao commented the following about giving the recent GPT-o1 model research-level math tasks to work on: 

> The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, (static simulation of a) graduate student.  However, this was an improvement over previous models, whose capability was closer to an actually incompetent (static simulation of a) graduate student.  It may only take one or two further iterations of improved capability (and integration with other tools, such as computer algebra packages and proof assistants) until the level of "(static simulation of a) competent graduate student" is reached, at which point I could see this tool being of significant use in research level tasks.

* There have also been other papers trying out various techniques such as "whiteboard of thought" ( https://whiteboard.cs.columbia.edu/ ) where an LLM, when being presented with visual problems in verbal format, explicitly generates visual representations of the verbal description to use as an aid in its reasoning. It feels like a relatively obvious idea would be to roll out these kinds of approaches into future LLM architectures, teaching them to generate "mental images" of whatever task they were told to work on. This could then be used as part of an internal simulation.
* There's an evolutionary argument that the steps from "pure pattern recognition" to "pattern recognition with mental simulation added" might be relatively simple and not require that much in the fundamental breakthroughs, since evolution managed to find it in humans and in humans those abilities seem to be relatively continuous with each other. So we might expect all of these iterative improvements to take us pretty smoothly toward AGI.

Focus: Allow Roman Yampolskiy to continue his research and pursue a PhD

Huh? Roman not only does have a PhD already, he's a tenured associate professor. Maybe this meant money to allow him to have PhD students - on a few occasions he suggested that I do an AI Safety-focused PhD with him.

Indeed, and there's another big reason for that - trying to always override your short-term "monkey brain" impulses just doesn't work that well for most people.

+1.

Which is a good thing, in this particular case, yes?

Less smoking does seem better than more smoking. Though generally it doesn't seem to me like social stigma would be a very effective way of reducing unhealthy behaviors - lots of those behaviors are ubiquitous despite being somewhat low-status. I think the problem is at least threefold:

  • As already mentioned, social stigma tends to cause optimization to avoid having the appearance of doing the low-status thing, instead of optimization to avoid doing the low-status thing. (To be clear, it does cause the latter too, but it doesn't cause the latter anywhere near exclusively.)
  • Social stigma easily causes counter-reactions where people turn the stigmatized thing into an outright virtue, or at least start aggressively holding that it's not actually that bad.
  • Shame makes things wonky in various ways. E.g. someone who feels they're out of shape may feel so much shame about the thought of doing badly if they try to exercise, they don't even try. For compulsive habits like smoking, there's often a loop where someone feels bad, turns to smoking to feel momentarily better, then feels even worse for having smoked, then because they feel even worse they are drawn even more strongly into smoking to feel momentarily better, etc.

I think generally people can maintain healthy habits much more consistently if their motivation comes from genuinely believing in the health benefits and wanting to feel better. But of course that's harder to spread on a mass scale, especially since not everyone actually feels better from healthy habits (e.g. some people feel better from exercise but some don't).

Then again, for the specific example of smoking in particular, stigma does seem to have reduced the amount of it (in part due to mechanisms like indoor smoking bans), so sometimes it does work anyway.

Incidentally, coherence therapy (which I know is one of the things Chris is drawing from) makes the distinction between three types of depression, some of them being strategies and some not. Also I recall Unlocking the Emotional Brain mentioning a fourth type which is purely biochemical.

From Coherence Therapy: Practice Manual & Training Guide:

Underlying emotional truth of depression: Three types

A. Depression that directly carries out an unconscious purpose/function
B. Depression that is a by-product of how an unconscious purpose is carried out
C. Depression expressing unconscious despair/grief/hopelessness

A. Depression that carries out an unconscious purpose

Client: Mother who is still in pained, debilitating depression 8 years after her 5-year-old son died after being hit by a car. (To view entire session see video 1096T, Stuck in Depression.) The following excerpt shows the creation of discovery experiences that reveal the powerful purpose of staying in depression (a purpose often encountered with clients in the bereavement process).

Th: I want you to look and see if there’s some other side of you, some area in your feelings where you feel you don’t deserve to be happy again.
Cl: Probably the guilt.
Th: The guilt. So what are the words of the guilt?
Cl: That I wasn’t outside when he was hit (to prevent it).
Th: I should have been outside.
Cl: I should have been outside.
Th: It’s my fault.
Cl: It’s my fault.

(About two minutes later:)

Th: Would you try to talk to me from the part of you that feels the guilt. Just from that side. I know there are these other sides. But from the place in you where you feel guilty, where you feel it was your fault that your dear little boy got hit by a truck, from that place, what’s the emotional truth for you — from that place — about whether it’s OK to feel happy again?
Cl: ...I don’t allow myself to be happy.
Th: [Very softly:] How come? How come?
Cl: How come?
Th: Because if you were happy—would you complete that sentence? “I don’t allow myself to be happy because if I were happy—”
Cl: I would have to forgive myself. [Pause.] And I’ve been unwilling to do that.
Th: Good. So keep going. “I’m unwilling to forgive myself because—”
Cl: You know there are parts of me that I think it’s about not wanting to go on myself without him.
And if I keep this going then I don’t have to do that.
Th: I see. So would you see him again? Picture Billy? And just try saying that to Billy. Try saying to him, ”I’m afraid that if I forgive myself I’ll lose connection with you and I’ll go on without you.”
Cl: [With much feeling:] Billy, even though I can picture you as a little angel I’m afraid to forgive myself—that you’ll go away and I don’t want you to go away.
Th: Yeah. And see if it’s true to say to him, “It’s so important for me to stay connected to you that I’m willing to not forgive myself forever. I’d rather be feeling guilty and not forgiving myself than lose contact with you and move on without you.” Try saying that. See if that feels true.
Cl: [Sighs. With much feeling:] Billy, I just feel like I would do anything to keep this connection with you including staying miserable and not forgiving myself for the rest of my life. And you know that’s true. [Her purpose for staying in depression is now explicit and directly experienced.]

B. Depression that is a by-product of how an unconscious purpose is carried out

Client: Lethargic woman, 33, says, “I’ve been feeling depressed and lousy for years… I have a black cloud around me all the time.” She describes herself as having absolutely no interests and as caring about nothing whatsoever, and expresses strong negative judgments toward herself for being a “vegetable.”

[Details of this example are in the 2002 publication cited in bibliography on p. 85. Several pro-symptom positions for depression were found and dissolved. The following account is from her sixth and final session.]

Discovery via symptom deprivation: Therapist prompts her to imagine having real interests; unhurriedly persists with this imaginal focus. Client suddenly exclaims, “I erased myself!” and describes how “my mother takes everything! She fucking takes it all! So I’ve got to erase myself! She always, always, always makes it her accomplishment, not mine. So why should I be anything? So I erased myself, so she couldn’t keep doing that to me.” Client now experiences her blankness as her own solution to her problem of psychological robbery, and recognizes her depression to be an inevitable by-product of living in the blankness that is crucial for safety but makes her future hopelessly empty.

Therapist then continues discovery into why “erasing” herself is the necessary way to be safe: Client brings to light a core presupposition of having no boundaries with mother, a “no walls rule.” With this awareness dawns the possibility of having “walls” so that what she thinks, feels or does remains private and cannot be stolen. She could then safely have interests and accomplishments. This new possibility immediately creates for client the tangible prospect of an appealing future, and she congruently describes rich feelings of excitement and energy.

Outcome: In response to follow-up query two months later, client reported, “It felt like a major breakthrough...this major rage got lifted” and said she had maintained privacy from mother around all significant personal matters. After two years she confirmed that the “black cloud” was gone, she was enthusiastically pursuing a new career, was off antidepressants, and said, “Things are good, in many ways. Things are very good.”

C. Depression expressing unconscious despair, grief, hopelessness

Client: Man with long history of a “drop” into depression every Fall. [This one-session example is video 1097SP, Down Every Year, available online at coherencetherapy.org. For a multi-session example of working with this type of depression, see “Unhappy No Matter What” in DOBT book, pp. 63-90.]

Surfaced emotional reality: At 10 he formed a belief that he failed parents’ expectations so severely that they forever “gave up on me” (he was sent in the Fall from USA to boarding school in Europe, was utterly miserable and begged to come home). Has been in despair ever since, unconsciously.

Outcome: Client subsequently initiated talk with parents about the incident 30 years ago; not once had it been discussed. In this conversation it became real to him that their behavior did not mean they gave up on him, and five months after session reported continuing relief from feeling depressed and inadequate.

Commenting on a relatively isolated point in what you wrote; none of this affects your core point about preferences being entangled with predictions (actually it relies on it).

This is why you could view a smoker's preference for another cigarette as irrational: the 'core want' is just a simple preference for the general feel of smoking a cigarette, but the short-jolt preference has the added prediction of "and this will be good to do". But that added prediction is false and inconsistent with everything they know. The usual statement of "you would regret this in the future".

I think that the short-jolt preference's prediction is actually often correct; it's just over a shorter time horizon. The short-term preference predicts that "if I take this smoke, then I will feel better" and it is correct. The long-term preference predicts that "I will later regret taking this smoke, " and it is also correct. Neither preference is irrational, they're just optimizing over different goals and timescales.

Now it would certainly be tempting to define rationality as something like "only taking actions that you endorse in the long term", but I'd be cautious of that. Some long-term preferences are genuinely that, but many of them are also optimizing for something looking good socially, while failing to model any of the genuine benefits of the socially-unpopular short-term actions. 

For example, smoking a cigarette often gives smokers a temporary feeling of being in control, and if they are going out to smoke together with others, a break and some social connection. It is certainly valid to look at those benefits and judge that they are still not worth the long-term costs... but frequently the "long-term" preference may be based on something like "smoking is bad and uncool and I shouldn't do it and I should never say that there could be a valid reason to do for otherwise everyone will scold me".

Then by maintaining both the short-term preference (which continues the smoking habit) and the long-term preference (which might make socially-visible attempts to stop smoking), the person may be getting the benefit from smoking while also avoiding some of the social costs of continuing.

This is obviously not to say that the costs of smoking would only be social. Of course there are genuine health reasons as well. But I think that quite a few people who care about "health" actually care about not appearing low status by doing things that everyone knows are unhealthy. 

Though even if that wasn't the case - how do you weigh the pleasure of a cigarette now, versus increased probability of various health issues some time in the future? It's certainly very valid to say that better health in the future outweighs the pleasure in the now, but there's also no objective criteria for why that should be; you could equally consistently put things other way around.

So I don't think that smoking a cigarette is necessarily irrational in the sense of making an incorrect prediction. It's more like a correct but only locally optimal prediction. (Though it's also valid to define rationality as something like "globally optimal behavior", or as the thing that you'd do if you got both the long-term and the short-term preference to see each other's points and then make a decision that took all the benefits and harms into consideration.)

I have a friend with eidetic imagination who says that for her, there is literally no difference between seeing something and imagining it. Sometimes she's worried about losing track of reality if she were to imagine too much.

Oh yeah, this. I used to think that "argh" or "it hurts" were just hyperbolic compliments for an excellent pun. Turns out, puns actually are painful to some people.

Load More