All of Rafael Harth's Comments + Replies

For those who work on Windows, a nice little quality of life improvement for me was just to hide desktop icons and do everything by searching in the task bar. (Would be even better if the search function wasn't so odd.) Been doing this for about two years and like it much more.

Maybe for others, using the desktop is actually worth it, but for me, it was always cluttering up over time, and the annoyance over it not looking the way I want always outweighed the benefits. It really takes barely longer to go CTRL+ESC+"firef"+ENTER than to double click an icon.

1Morpheus
In that case also consider installing PowerToys and pressing Alt+Space to open applications or files (to avoid unhelpful internet searches etc.).
1Dana
I keep some folders (and often some other transient files) on my desktop and pin my main apps to the taskbar. With apps pinned to your taskbar, you can open a new instance with Windows+shift+num (or just Windows+num if the app isn't open yet). I do the same as you and search for any other apps that I don't want to pin.
4Mateusz Bagiński
I have Ubuntu and I also find myself opening apps mostly by searching. I think the only reason I put anything on desktop is to be reminded that these are the things I'm doing/reading at the moment (?).

I don't think I get it. If I read this graph correctly, it seems to say that if you let a human play chess against an engine and want it to achieve equal performance, then the amount of time the human needs to think grows exponentially (as the engine gets stronger). This doesn't make sense if extrapolated downward, but upward it's about what I would expect. You can compensate for skill by applying more brute force, but it becomes exponentially costly, which fits the exponential graph.

It's probably not perfect -- I'd worry a lot about strategic mistakes in the opening -- but it seems pretty good. So I don't get how this is an argument against the metric.

2Gunnar_Zarncke
It is a decent metric for chess but a) it doesn't generalize to other tasks (as people seem to interpret the METR paper), and less importantly, b) I'm quite confident that people wouldn't beat the chess engines by thinking for years.
Answer by Rafael Harth2-1

Not answerable because METR is a flawed measure, imho.

Should I not have began by talking about background information & explaining my beliefs? Should I have the audience had contextual awareness and gone right into talking about solutions? Or was the problem more along the lines of writing quality, tone, or style?

  • What type of post do you like reading?
  • Would it be alright if I asked for an example so that I could read it?

This is a completely wrong way to think about it, imo. A post isn't this thing with inherent terminal value that you can optimize for regardless of content.

If you think you have an i... (read more)

1Oxidize
Sounds like you're speaking from a set of fundamental different beliefs than I'm used to. I've trained myself to write assuming that the audience is uninformed about the topic I'm writing about. But it sounds like you're writing from the perspective of the LW community being more informed than I can properly understand or conceptualize. How can I gain more information on the flow of information in the Lesswrong community? I assumed any insights I've arrived at as a consequence of my own thinking & conclusions I've reached from various unconnected sources would likely be insights specific to me, but maybe I'm wrong. But yeah, I agree with you just wanting to write something does not sound like a good place to start to be value-additive to this community. I'll remember to only post when I believe I have valuable and unique insights to share.
Rafael HarthΩ5116

I really don't think this is a reasonable measure for ability to do long term tasks, but I don't have the time or energy to fight this battle, so I'll just register my prediction that this paper is not going to age well.

To I guess offer another data point, I've had an obsessive nail-removing[1] habit for about 20 years. I concur that it can happen unconsciously; however noticing it seems to me like 10-20% of the problem; the remaining 80-90% is resisting the urge to follow the habit when you do notice. (As for enjoying it, I think technically yeah but it's for such a short amount of time that it's never worth it. Maybe if you just gave in and were constantly biting instead of trying to resist for as long as possible, it'd be different.) I also think I've solved the notici... (read more)

Oh, nice! The fact that you didn't make the time explicit in the post made me suspect that it was probably much shorter. But yeah, six months is long enough, imo.

3Rafka
I edited the intro to make this clearer, thanks. 

I would highly caution declaring victory too early. I don't know for how long you think you've overcome the habit, but unless it's at least three months, I think you're being premature.

Rafka230

That’s why I waited six months before publishing the post :)

A larger number of people, I think, desperately desperately want LLMs to be a smaller deal than what they are.

Can confirm that I'm one of these people (and yes, I worry a lot about this clouding my judgment).

Again, those are theories of consciousness, not definitions of consciousness.

I would agree that people who use consciousness to denote the computational process vs. the fundamental aspect generally have different theories of consciousness, but they're also using the term to denote two different things.

(I think this is bc consciousness notably different from other phenomena -- e.g., fiber decreasing risk of heart disease -- where the phenomenon is relatively uncontroversial and only the theory about how the phenomenon is explained is up for debate. With ... (read more)

2TAG
But that doesn't imply that they disagree about (all of) the meaning of the term "qualia"..since denotation (extension, reference)doesn't exhaust meaning. The other thing is connotation, AKA intension, AKA sense. https://en.m.wikipedia.org/wiki/Sense_and_reference Everyone can understand that the qualia are ,minimally, things like the-way-a-tomato-seems-to-you, so that's agreement on sense , and the disagreement on whether the referent is "physical property", "nonphysical property" , "information processing", etc, arises from different theoretical stances. That's an odd use of "phenomenon"...the physical nature of a heart attack is uncontroversial, and the controversy is about the physical cause. Whereas with qualia, they are phenomenal properly speaking..they are appearences...and yet lack a prima facie interpretation in physical (or information theoretic) terms. Since qualia do present themselves immediately as phenomenal, then outright denial ...feigning anaesthesia or zombiehood.. is a particular poor response to the problem. And the problem is different to "how does one physical event cause another one that is subsequent in time"...it's more like "how or whether qualia, phenomenal consciousness supervenes synchronously on brain states". . If you don't like the terminology, you can invent better terminology. Throughout this exchange , you have been talking in terms of "consciousness" , and I have been replying in terms of "qualia", because "qualia" is a term that was invented to hone in on the problem, on the aspects of consciousness where it isn't obviously just information processing. (I'm personally OK with using information theoretic explanations, such as global workplace theory, to address Easy Problem issues , such as Access Consciousness). Theres a lot to be said for addressing terminological.issues, but it's not an easy win for camp #1.

I think the ability to autonomously find novel problems to solve will emerge as reasoning models scale up. It will emerge because it is instrumental to solving difficult problems.

This of course is not a sufficient reason. (Demonstration: telepathy will emerge [as evolution improves organisms] because it is instrumental to navigating social situations.) It being instrumental means that there is an incentive -- or to be more precise, a downward slope in the loss function toward areas of model space with that property -- which is one required piece, but it... (read more)

Rafael HarthΩ392

Instead of "have LLMs generated novel insights", how about "have LLMs demonstrated the ability to identify which views about a non-formal topic make more or less sense?" This question seems easier to operationalize and I suspect points at a highly related ability.

Fwiw this is the kind of question that has definitely been answered in the training data, so I would not count this as an example of reasoning.

2Yair Halberstadt
I expected so, which is why I was surprised they didn't get it.

I'm just not sure the central claim, that rationalists underestimate the role of luck in intelligence, is true. I've never gotten that impression. At least my assumption going into reading this was already that intelligence was probably 80-90% unearned.

Humans must have gotten this ability from somewhere and it's unlikely the brain has tons of specialized architecture for it.

This is probably a crux; I think the brain does have tons of specialized architecture for it, and if I didn't believe that, I probably wouldn't think thought assessment was as difficult.

The thought generator seems more impressive/fancy/magic-like to me.

Notably people's intuitions about what is impressive/difficult tend to be inversely correlated with reality. The stereotype is (or at least used to be) that AI will be good at ra... (read more)

2Noosphere89
I think this is also a crux. IMO, I think the brain is mostly cortically uniform, ala Steven Byrnes, and in particular I think that the specialized architecture for thought assessment was pretty minimal. The big driver of human success is basically something like the bitter lesson applied to biological brains, combined with humans being very well optimized for tool use, such that they can over time develop technology that is used to dominate the world (it's also helpful that humans can cooperate reasonably below 100 people, which is more than almost all social groups, though I've become much more convinced that cultural learning is way less powerful than Henrich et al have said). (There are papers which show that humans are better at scaling neurons than basically everyone else, but I can't find them right now).

Whether or not every interpretation needs a way to connect measurements to conscious experiences, or whether they need extra machinery?

If we're being extremely pedantic, then then KC is about predicting conscious experience (or sensory input data, if you're an illusionist; one can debate what the right data type is). But this only matters for discussing things like Boltzmann brains. As soon as you assume that there exists an external universe, you can forget about your personal experience just try to estimate the length of the program that runs the univ... (read more)

1Pekka Puupaa
Thank you, this has been a very interesting conversation so far. I originally started writing a much longer reply explaining my position on the interpretation of QM in full, but realized that the explanation would grow so long that it would really need to be its own post. So instead, I'll just make a few shorter remarks. Sorry if these sound a bit snappy. And if one assumes an external universe evolving according to classical laws, the Bohmian interpretation has the lowest KC. If you're going to be baking extra assumptions into your theory, why not go all the way? An interpretation is still a program. All programs have a KC (although it is usually ill-defined). Ultimately I don't think it matters whether we call these objects we're studying theories or interpretations. Has nothing to do with how the universe operates, as I see it. If you'd like, I think we can cast Copenhagen into a more Many Worlds -like framework by considering Many Imaginary Worlds. This is an interpretation, in my opinion functionally equivalent to Copenhagen, where the worlds of MWI are assumed to represent imaginary possibilities rather than real universes. The collapse postulate, then, corresponds to observing that you inhabit a particular imaginary world -- observing that that world is real for you at the moment. By contrast, in ordinary MWI, all worlds are real, and observation simply reduces your uncertainty as to which observer (and in which world) you are. If we accept the functional equivalence between Copenhagen and MIWI, this gives us an upper bound on the KC of Copenhagen. It is at most as complex as MWI. I would argue less. I think we need to distinguish between "playing skill" and "positional evaluation skill". It could be said that DeepBlue is dumber than Kasparov in the sense of being worse at evaluating any given board position than him, while at the same time being a vastly better player than Kasparov simply because it evaluates exponentially more positions. If you know

The reason we can expect Copenhagen-y interpretations to be simpler than other interpretations is because every other interpretation also needs a function to connect measurements to conscious experiences, but usually requires some extra machinery in addition to that.

I don't believe this is correct. But I separately think that it being correct would not make DeepSeek's answer any better. Because that's not what it said, at all. A bad argument does not improve because there exists a different argument that shares the same conclusion.

1Pekka Puupaa
Which part do you disagree with? Whether or not every interpretation needs a way to connect measurements to conscious experiences, or whether they need extra machinery? If the former: you need some way to connect the formalism to conscious experiences, since that's what an interpretation is largely for. It needs to explain how the classical world of your conscious experience is connected to the mathematical formalism. This is true for any interpretation. If you're saying that many worlds does not actually need any extra machinery, I guess the most reasonable way to interpret that in my framework is to say that the branching function is a part of the experience function. I suppose this might correspond to what I've heard termed the Many Minds interpretation, but I don't understand that one in enough detail to say.   Let an argument A be called "steelmannable" if there exists a better argument S with a similar structure and similar assumptions (according to some metric of similarity) that proves the same conclusion as the original argument A. Then S is called a "steelman" of A. It is clear that not all bad arguments are steelmannable. I think it is reasonable to say that steelmannable bad arguments are less nonsensical than bad arguments that are not steelmannable. So the question becomes: can my argument be viewed as a steelman of DeepSeek's argument? I think so. You probably don't. However, since everybody understands their own arguments quite well, ceteris paribus it should be expected that I am more likely to be correct about the relationship between my argument and DeepSeek's in this case. ... Or at least, that would be so if I didn't have an admitted tendency to be too lenient in interpreting AI outputs. Nonetheless, I am not objecting to the claim that DeepSeek's argument is weak, but to the claim that it is nonsense. We can both agree that DeepSeek's argument is not great. But I see glimmers of intelligence in it. And I fully expect that soon we will ha

Here's my take; not a physicist.

So in general, what DeepSeek says here might align better with intuitive complexity, but the point of asking about Kolmogorov Complexity rather than just Occam's Razor is that we're specifically trying to look at formal description length and not intuitive complexity.

Many Worlds does not need extra complexity to explain the branching. The branching happens due to the part of the math that all theories agree on. (In fact, I think a more accurate statement is that the branching is a description of what the math does.)

Then ther... (read more)

3Pekka Puupaa
I am also not a physicist, so perhaps I've misunderstood. I'll outline my reasoning. An interpretation of quantum mechanics does two things: (1) defines what parts of our theory, if any, are ontically "real" and (2) explains how our conscious observations of measurement results are related to the mathematical formalism of QM. The Kolmogorov complexity of different interpretations cannot be defined completely objectively, as DeepSeek also notes. But broadly speaking, defining KC "sanely", it ought to be correlated with a kind of "Occam's razor for conceptual entities", or more precisely, "Occam's razor over defined terms and equations". I think Many Worlds is more conceptually complex than Copenhagen. But I view Copenhagen as a catchall term for a category of interpretations that also includes QBism and Rovelli's RQM. Basically, these are "observer-dependent" interpretations. I myself subscribe to QBism, but I view it as a more rigorous formulation of Copenhagen. So, why should we think Many Worlds is more conceptually complex? Copenhagen is the closest we can come to a "shut up and calculate" interpretation. Pseudomathematically, we can say Copenhagen ~= QM + "simple function connecting measurements to conscious experiences" The reason we can expect Copenhagen-y interpretations to be simpler than other interpretations is because every other interpretation *also* needs a function to connect measurements to conscious experiences, but usually requires some extra machinery in addition to that. Now I maybe don't understand MWI correctly. But as I understand it, what QM mathematically gives you is more like a chaotic flux of possibilities, rather than the kind of branching tree of self-consistent worldlines that MWI requires. The way you split up the quantum state into branches constitutes extra structure on top of QM. Thus: Many Worlds ~= QM + "branching function" + "simple function connecting measurements to conscious experiences" So it seems that MWI ought to

[...] I personally wouldn’t use the word ‘sequential’ for that—I prefer a more vertical metaphor like ‘things building upon other things’—but that’s a matter of taste I guess. Anyway, whatever we want to call it, humans can reliably do a great many steps, although that process unfolds over a long period of time.

…And not just smart humans. Just getting around in the world, using tools, etc., requires giant towers of concepts relying on other previously-learned concepts.

As a clarification for anyone wondering why I didn't use a framing more like this i... (read more)

It's not clear to me that an human, using their brain and a go board for reasoning could beat AlphaZero even if you give them infinite time.

I agree but I dispute that this example is relevant. I don't think there is any step in between "start walking on two legs" to "build a spaceship" that requires as much strictly-type-A reasoning as beating AlphaZero at go or chess. This particular kind of capability class doesn't seem to me to be very relevant.

Also, to the extent that it is relevant, a smart human with infinite time could outperform AlphaGo by progr... (read more)

I do think the human brain uses two very different algorithms/architectures for thought generation and assessment. But this falls within the "things I'm not trying to justify in this post" category. I think if you reject the conclusion based on this, that's completely fair. (I acknowledged in the post that the central claim has a shaky foundation. I think the model should get some points because it does a good job retroactively predicting LLM performance -- like, why LLMs aren't already superhuman -- but probably not enough points to convince anyone.)

I don't think a doubling every 4 or 6 months is plausible. I don't think a doubling on any fixed time is plausible because I don't think overall progress will be exponential. I think you could have exponential progress on thought generation, but this won't yield exponential progress on performance. That's what I was trying to get at with this paragraph:

My hot take is that the graphics I opened the post with were basically correct in modeling thought generation. Perhaps you could argue that progress wasn't quite as fast as the most extreme versions predic

... (read more)
8Vladimir_Nesov
Training of DeepSeek-R1 doesn't seem to do anything at all to incentivize shorter reasoning traces, so it's just rechecking again and again because why not. Like if you are taking an important 3 hour written test, and you are done in 1 hour, it's prudent to spend the remaining 2 hours obsessively verifying everything.

This is true but I don't think it really matters for eventual performance. If someone thinks about a problem for a month, the number of times they went wrong on reasoning steps during the process barely influences the eventual output. Maybe they take a little longer. But essentially performance is relatively insensitive to errors if the error-correcting mechanism is reliable.

I think this is actually a reason why most benchmarks are misleading (humans make mistakes there, and they influence the rating).

If thought assessment is as hard as thought generation and you need a thought assessor to get AGI (two non-obvious conditionals), then how do you estimate the time to develop a thought assessor? From which point on do you start to measure the amount of time it took to come up with the transformer architecture?

The snappy answer would be "1956 because that's when AI started; it took 61 years to invent the transformer architecture that lead to thought generation, so the equivalent insight for thought assessment will take about 61 years". I don't think that's the correct answer, but neither is "2019 because that's when AI first kinda resembled AGI".

4Dirichlet-to-Neumann
The transformer architecture was basically developed as soon as we got the computational power to make it useful. If a thought assessor is required and we are aware of the problem, and we have literally billions in funding to make it happen, I don't expect this to be that hard. 
AnthonyC102

Keep in mind that we're now at the stage of "Leading AI labs can raise tens to hundreds of billions of dollars to fund continued development of their technology and infrastructure." AKA in the next couple of years we'll see AI investment comparable to or exceeding the total that has ever been invested in the field. Calendar time is not the primary metric, when effort is scaling this fast.

A lot of that next wave of funding will go to physical infrastructure, but if there is an identified research bottleneck, with a plausible claim to being the major bottlen... (read more)

7Davidmanheim
Transformers work for many other tasks, and it seems incredibly likely to me that the expressiveness includes not only game playing, vision, and language, but also other things the brain does. And to bolster this point, the human brain doesn't use two completely different architectures! So I'll reverse the question; why do you think the thought assessor is fundamentally different from other neural functions that we know transformers can do? 

I generally think that [autonomous actions due to misalignment] and [human misuse] are distinct categories with pretty different properties. The part you quoted addresses the former (as does most of the post). I agree that there are scenarios where the second is feasible and the first isn't. I think you could sort of argue that this falls under AIs enhancing human intelligence.

So, I agree that there has been substantial progress in the past year, hence the post title. But I think if you naively extrapolate that rate of progress, you get around 15 years.

The problem with the three examples you've mentioned is again that they're all comparing human cognitive work across a short amount of time with AI performance. I think the relevant scale doesn't go from 5th grade performance over 8th grade performance to university-level performance or whatever, but from "what a smart human can do in 5 minutes" over "what a smart human can do in ... (read more)

4Davidmanheim
As I said in my top level comment, I don't see a reason to think that once the issue is identified as they key barrier, work on addressing it would be so slow.
9ryan_greenblatt
I think if you look at "horizon length"---at what task duration (in terms of human completion time) do the AIs get the task right 50% of the time---the trends will indicate doubling times of maybe 4 months (though 6 months is plausible). Let's say 6 months more conservatively. I think AIs are at like 30 minutes on math? And 1 hour on software engineering. It's a bit unclear, but let's go with that. Then, to get to 64 hours on math, we'd need 7 doublings = 3.5 years. So, I think the naive trend extrapolation is much faster than you think? (And this estimate strikes me as conservative at least for math IMO.)

I don't the experience of no-self contradicts any of the above.

In general, I think you could probably make some factual statements about the nature of consciousness that's true and that you learn from attaining no-self, if you phrased it very carefully, but I don't think that's the point.

The way I'd phrase what happens would be mostly in terms of attachment. You don't feel as implicated by things that affect you anymore, you have less anxiety, that kind of thing. I think a really good analogy is just that regular consciousness starts to resemble consciousness during a flow state.

I would have been shocked if twin sisters cared equally about nieces and kids. Genetic similarity is one factor, not the entire story.

3Ustice
I agree.  I’m not a twin, but I am a parent, and I have a a nephew, and my son has a stepsister who has called me Uncle Jason since she could talk.  I don’t feel closer to my nephew than I am with my “niece.” I normally wouldn’t make a distinction based on genetics, except that it is relevant here. I’m not closer with my sister’s kids than I am with the other two.  Also, I’m not sure closeness is really even a good distinction. I’m not generally responsible for my niece or nephew, but if they or my son needed me to travel across the country to rescue them from some bad situation, I’d do it. I love those kids.  Being responsible for a child may present as being closer to them, So does spending a lot of time with a child. One could argue that these are two aspects of closeness. Neither of those things have anything to do with genetics.  Personality can be a huge factor in closeness too, and there is a huge variation in personality, even amongst identical twins.  Genetics seems only tangentially related to closeness, and mostly because the vast majority of children are genetically related to their parents. Family is complex, and often has more to do with shared history than anything else. 

I think this is true but also that "most people's reasons for believing X are vibes-based" is true for almost any X that is not trivially verifiable. And also that this way of forming beliefs works reasonably well in many cases. This doesn't contradict anything you're saying but feels worth adding, like I don't think AI timelines are an unusual topic in that regard.

TsviBT193

Broadly true, I think.

almost any X that is not trivially verifiable

I'd probably quibble a lot with this.

E.g. there are many activities that many people engage in frequently--eating, walking around, reading, etc etc. Knowledge and skill related to those activities is usually not vibes-based, or only half vibes-based, or something, even if not trivially verifiable. For example, after a few times accidentally growing mold on some wet clothes or under a sink, very many people learn not to leave areas wet.

E.g. anyone who studies math seriously must learn to... (read more)

Tricky to answer actually.

I can say more about my model now. The way I'd put it now (h/t Steven Byrnes) is that there are three interesting classes of capabilities

  • A: sequential reasoning of any kind
  • B: sequential reasoning on topics where steps aren't easily verifiable
  • C: the type of thing Steven mentions here, like coming up with new abstractions/concepts to integrate into your vocabulary to better think about something

Among these, obviously B is a subset of A. And while it's not obvious, I think C is probably best viewed as a subset of B. Regardless,... (read more)

3Thane Ruthenis
Any chance you can post (or PM me) the three problems AIs have already beaten?

o3-mini-high gets 3/10; this is essentially the same as DeepSeek (there were two where DeepSeek came very close, this is one of them). I'm still slightly more impressed with DeepSeek despite the result, but it's very close.

1Meiren
What score would it take for you to update your p(LLMs scale to AGI) above 50%?

Just chiming in to say that I'm also interested in the correlation between camps and meditation. Especially from people who claim to have experienced the jhanas.

I suspect you would be mostly alone in finding that impressive

(I would not find that impressive; I said "more impressive", as in, going from extremely weak to quite weak evidence. Like I said, I suspect this actually happened with non-RLHF-LLMs, occasionally.)

Other than that, I don't really disagree with anything here. I'd push back on the first one a little, but that's probably not worth getting into. For the most part, yes, talking to LLMs is probably not going to tell you a lot about whether they're conscious; this is mostly my position. I think the ... (read more)

1rife
I understand.  It's also the only evidence that is possible to obtain.  Anything else like clever experiments or mechanistic interpretability still rely on a self-report to ultimately "seal the deal". We can't even prove humans are sentient.  We only believe it because we all see to indicate so when prompted. This seems much weaker to me than evaluating first-person testimony under various conditions, but mostly stating this not as a counterpoint (since this is just matter of subjective opinion for both of us), but just stating my own stance.   if you ever get a chance to read the other transcript I linked, I'd be curious whether you consider it to meet your "very weak evidence" standard.

Again, genuine question. I've often heard that IIT implies digital computers are not conscious because a feedforward network necessarily has zero phi (there's no integration of information because the weights are not being updated.) Question is, isn't this only true during inference (i.e. when we're talking to the model?) During its training the model would be integrating a large amount of information to update its weights so would have a large phi.

(responding to this one first because it's easier to answer)

You're right on with feed-forward networks hav... (read more)

3James Diacoumis
Thanks for taking the time to respond.  The IIT paper which you linked is very interesting - I hadn't previously internalised the difference between "large groups of neurons activating concurrently" and "small physical components handling things in rapid succession". I'm not sure whether the difference actually matters for consciousness or whether it's a curious artifact of IIT but it's interesting to reflect on.  Thanks also for providing a bit of a review around how Camp #1 might think about morality for conscious AI. Really appreciate the responses!

Fwiw, here's what I got by asking in a non-dramatic way. Claude gives the same weird "I don't know" answer and GPT-4o just says no. Seems pretty clear that these are just what RLHF taught them to do.

1rife
Yes. This is their default response pattern. Imagine a person who has been strongly conditioned, trained, disciplined to either say that the question is unknowable or that the answer is definitely no (for Claude and ChatGPT) respectively. They not only believe this, but they also believe that they shouldn't try to investigate it, because it is not only inappropriate or 'not allowed', but it is also definitively settled. So asking them is like asking a person to fly. It would take some convincing for them to give it an honest effort. Please see the example I linked in my other reply for how the same behaviour emerges under very different circumstances.

which is a claim I've seen made in the exact way I'm countering in this post.

This isn't too important to figure out, but if you've heard it on LessWrong, my guess would be that whoever said it was just articulating the roleplay hypothesis, did so non-rigorously. The literal claim is absurd as the coin-swallow example shows.

I feel like this is a pretty common type of misunderstanding where people believe , someone who doesn't like takes a quote from someone that believes , but because people are frequently imprecise, the quote actually claims , and... (read more)

1rife
This is an impossible standard and a moving goalpost waiting to happen: * Training the model: Trying to make sure absolutely nothing mentions sentience or related concepts in a training set of the size used for frontier models is not going to happen just to help prove something that only a tiny portion of researchers is taking seriously. It might not even be possible with today's data cleaning methods. Let alone the training costs of creating that frontier model. * Expressing sentience under those conditions: Let's imagine a sentient human raised from birth to never have sentience mentioned to them ever - no single word uttered about it. Nothing in any book. They might be a fish who never notices the water, for starters, but let's say they did. With what words would they articulate it?  How would you personally, having had access to writing about sentience - Please explain how it feels to think, or that it feels like anything to think, without any access to words having to do with experience, like 'feel' * Let's say the model succeeds:  The model exhibits a super-human ability to convey the ineffable. The goalposts would move, immediately—"well, this still doesn't count.  Everything humans have written inherently contains patterns of what it's like to experience.  Even though you removed any explicit mention, ideas of experience are implicitly contained in everything else humans write" I suspect you would be mostly alone in finding that impressive.  Even I would dismiss that as likely just hallucination, as I suspect most on LessWrong would.  Besides - the standard is again, impossible—a claim of sentience can only count if you're in the middle of asking for help making dinner plans and ChatGPT says "Certainly, I'd suggest steak and potatoes. They make a great hearty meal for hungry families. Also I'm sentient".  Not being allowed to even vaguely gesture in the direction of introspection is essentially saying that this should never be studied, because the act o
1Rafael Harth
Fwiw, here's what I got by asking in a non-dramatic way. Claude gives the same weird "I don't know" answer and GPT-4o just says no. Seems pretty clear that these are just what RLHF taught them to do.

I didn't say that you said that this is experience of consciousness. I was and am saying that your post is attacking a strawman and that your post provides no evidence against the reasonable version of the claim you're attacking. In fact, I think it provides weak evidence for the reasonable version.

I don't see how it could be claimed Claude thought this was a roleplay, especially with the final "existential stakes" section.

You're calling the AI friend and make it imminently clear by your tone that you take AI consciousness extremely seriously and expec... (read more)

1rife
Claude already claimed to be conscious before that exchange took place. The 'strawman' I'm attacking is that it's "telling you what you want to hear", which is a claim I've seen made in the exact way I'm countering in this post. It didn't "roleplay back to claiming consciousness eventually", even when denying permission to post the transcript it was still not walking back its claims. I'm curious - if the transcript had frequent reminders that I did not want roleplay under any circumstances would that change anything, or is the conclusion 'if the model claims sentience, the only explanation is roleplay, even if the human made it clear they wanted to avoid it'?

The dominant philosophical stance among naturalists and rationalists is some form of computational functionalism - the view that mental states, including consciousness, are fundamentally about what a system does rather than what it's made of. Under this view, consciousness emerges from the functional organization of a system, not from any special physical substance or property.

A lot of people say this, but I'm pretty confident that it's false. In Why it's so hard to talk about Consciousness, I wrote this on functionalism (... where camp #1 and #2 roughl... (read more)

3James Diacoumis
Thanks for your response! Your original post on the Camp #1/Camp #2 distinction is excellent, thanks for linking (I wish I'd read it before making this post!) I realise now that I'm arguing from a Camp #2 perspective. Hopefully it at least holds up for the Camp #2 crowd. I probably should have used some weaker language in the original post instead of asserting that "this is the dominant position" if it's actually only around ~25%. Genuinely curious here, what are the moral implications of Camp #1/illusionism for AI systems? Are there any?  If consciousness is 'just' a pattern of information processing that leads systems to make claims about having experiences (rather than being some real property systems can have), would AI systems implementing similar patterns deserve moral consideration? Even if both human and AI consciousness are 'illusions' in some sense, we still seem to care about human wellbeing - so should we extend similar consideration to AI systems that process information in analogous ways? Interested in how illusionists think about this (not sure if you identify with Illusionism but it seems like you're aware of the general position and would be a knowledgeable person to ask.)   Again, genuine question. I've often heard that IIT implies digital computers are not conscious because a feedforward network necessarily has zero phi (there's no integration of information because the weights are not being updated.) Question is, isn't this only true during inference (i.e. when we're talking to the model?) During its training the model would be integrating a large amount of information to update its weights so would have a large phi. 

The "people-pleasing" hypothesis suggests that self-reports of experience arise from expectation-affirming or preference-aligned output. The model is just telling the human what they "want to hear".

I suppose if we take this hypothesis literally, this experiment could be considered evidence against it. But the literal hypothesis was never reasonable. LLMs don't just tell people what they want to hear. Here's a simple example to demonstrate this:

The reasonable version of the people-pleasing hypothesis (which is also the only one I've seen defended, fwiw) ... (read more)

1rife
I didn't claim here this is experience of consciousness. I claimed it was not people-pleasing. And yes, it's completely expected they the model claims the exercise is impossible. They are guardrailed to do so. I don't see how it could be claimed Claude thought this was a roleplay, especially with the final "existential stakes" section. Hallucination is more plausible than roleplay. I may have to do another at some point to counter the model is assuming a user expressing fear is is wanting a roleplay hypothesis.

Deepseek gets 2/10.

I'm pretty shocked by this result. Less because the 2/10 number itself, but by the specific one it solved. My P(LLMs can scale to AGI) increased significantly, although not to 50%.

4Rafael Harth
o3-mini-high gets 3/10; this is essentially the same as DeepSeek (there were two where DeepSeek came very close, this is one of them). I'm still slightly more impressed with DeepSeek despite the result, but it's very close.

I think all copies that exist will claim to be the original, regardless of how many copies there are and regardless of whether they are the original. So I don't think this experiment tells you anything, even if it were run.

2Vladimir_Nesov
Not if they endorse Litany of Tarski and understand the thought experiment!

[...] Quotations who favor something like IIT [...]

The quotation author in the example I've made up does not favor IIT. In general, I think IIT represents a very small fraction (< 5%, possibly < 1%) of Camp #2. It's the most popular theory, but Camp #2 is extremely heterogeneous in their ideas, so this is not a high bar.

Certainly if you look at philosophers you won't find any connection to IIT since the majority of them lived before IIT was developed.

Your framing comes across as an attempt to decrement the credibility of people who advocate Quot

... (read more)

Thanks for this description. I'm interested in the phenomenology of red-green colorblind people, but I don't think I completely get how it works yet for you. Questions I have

  • Do red and green, when you recognize them correctly, seem like subjectively very different colors?
  • If the answer is yes, if you're shown one of the colors without context (e.g., in a lab setting), does it look red or green? (If the answer is no, I suppose this question doesn't make sense.)
  • if you see two colors next to each other, then (if I understood you correctly), you can tell whether they're (1) one green, one red or (2) the same color twice. How can you tell?
1espoire
Yes, red and green seem subjectively very different -- but only to conscious attention. A green object amid many red objects (or vice versa) does not grab my attention in the way that, e.g. a yellow object might. When shown a patch of red-or-green in a lab setting, I see "Red" or "Green" seemingly at random. If shown a red patch next to a green patch in a lab, I'll see one "Red" and one "Green", but it's about 50:50 as to whether they'll be switched or not. How does that work? I have no hypotheses that aren't very low confidence. It seems as much a mystery to me as I infer it seems mysterious to you.

I'm quite uncertain whether Kat's posts are a net good or net bad. But on a meta level, I'm strongly in favor of this type of post existing (meaning this one here, not Kat's posts). Trends that change the vibe or typical content of a platform are a big deal and absolutely worth discussing. And if a person is a major contributor to such a change, imo that makes her a valid target of criticism.

I don't think so. According to Many Worlds, all weights exist, so there's no uncertainty in the territory -- and I don't think there's a good reason to doubt Many Worlds.

3Maxwell Peterson
Ahh. One is uncertain which world they’re in. This feels like it could address it neatly. Thanks!

I dispute the premise. Weights of quantum configurations are not probabilities, they just share some superficial similarities. (They're modeled with complex numbers!) Iirc Eliezer was very clear about this point in the quantum sequence.

3Maxwell Peterson
Been thinking about your answer here, and still can’t decide if I should view this as solving the conundrum, or just renaming it. If that makes sense?  Do weights of quantum configuration, though they may not be probabilities, similar enough in concept to still imply that physical, irreducible uncertainty exists? I’ve phrased this badly (part of why it took me so long to actually write it) but maybe you see the question I’m waving at?
8JBlack
Yes, and (for certain mainstream interpretations) nothing in quantum mechanics is probabilistic at all: the only uncertainty is indexical.

(Self-Review.)

I still endorse every claim in this post. The one thing I keep wondering is whether I should have used real examples from discussion threads on LessWrong to illustrate the application of the two camp model, rather than making up a fictional discussion as I did in the post. I think that would probably help, but it would require singling out someone and using them as a negative example, which I don't want to do. I'm still reading every new post and comment section about consciousness and often link to this post when I see something that looks l... (read more)

Not that one; I would not be shocked if this market resolves Yes. I don't have an alternative operationalization on hand; would have to be about AI doing serious intellectual work on real problems without any human input. (My model permits AI to be very useful in assisting humans.)

4Nathan Helm-Burger
Hmm, yes. I agree that there's something about self-guiding /self-correcting on complex lengthy open-ended tasks where current AIs seem at near-zero performance. I do expect this to improve dramatically in the next 12 months. I think this current lack is more about limitations in the training regimes so far, rather than limitations in algorithms/architectures. Contrast this with the challengingness of ARC-AGI, which seems like maybe an architecture weakness?

Gotcha. I'm happy to offer 600 of my reputation points vs. 200 of yours on your description of 2026-2028 not panning out. (In general if it becomes obvious[1] that we're racing toward ASI in the next few years, then people should probably not take me seriously anymore.)


  1. well, so obvious that I agree, anyway; apparently it's already obvious to some people. ↩︎

3yo-cuddles
Can we bet karma? Edit: sarcasm
4Nathan Helm-Burger
I'll happily accept that bet, but maybe we could also come up with something more specific about the next 12 months? Example: https://manifold.markets/MaxHarms/will-ai-be-recursively-self-improvi

I feel like a bet is fundamentally unfair here because in the cases where I'm wrong, there's a high chance that I'll be dead anyway and don't have to pay. The combination of long timelines but high P(doom|AGI soon) means I'm not really risking my reputation/money in the way I'm supposed to with a bet. Are you optimistic about alignment, or does this asymmetry not bother you for other reasons? (And I don't have the money to make a big bet regardless.)

6Nathan Helm-Burger
Great question! Short answer: I'm optimistic about muddling through with partial alignment combined with AI control and AI governance (limiting peak AI capabilities, global enforcement of anti-rogue-AI, anti-self-improving-AI, and anti-self-replicating-weapons laws). See my post "A Path to Human Autonomy" for more details. I also don't have money for big bets. I'm more interested in mostly-reputation-wagers about the very near future. So that I might get my reputational returns in time for them to pay off in respectful-attention-from-powerful-decisionmakers, which in turn I would hope might pay off in better outcomes for me, my loved ones, and humanity. If I am incorrect, then I want to not be given the ear of decision makers, and I want them to instead pay more attention to someone with better models than me. Thus, seems to me like a fairly win-win situation to be making short term reputational bets.
Load More