I don't think I get it. If I read this graph correctly, it seems to say that if you let a human play chess against an engine and want it to achieve equal performance, then the amount of time the human needs to think grows exponentially (as the engine gets stronger). This doesn't make sense if extrapolated downward, but upward it's about what I would expect. You can compensate for skill by applying more brute force, but it becomes exponentially costly, which fits the exponential graph.
It's probably not perfect -- I'd worry a lot about strategic mistakes in the opening -- but it seems pretty good. So I don't get how this is an argument against the metric.
Not answerable because METR is a flawed measure, imho.
Should I not have began by talking about background information & explaining my beliefs? Should I have the audience had contextual awareness and gone right into talking about solutions? Or was the problem more along the lines of writing quality, tone, or style?
- What type of post do you like reading?
- Would it be alright if I asked for an example so that I could read it?
This is a completely wrong way to think about it, imo. A post isn't this thing with inherent terminal value that you can optimize for regardless of content.
If you think you have an i...
I really don't think this is a reasonable measure for ability to do long term tasks, but I don't have the time or energy to fight this battle, so I'll just register my prediction that this paper is not going to age well.
To I guess offer another data point, I've had an obsessive nail-removing[1] habit for about 20 years. I concur that it can happen unconsciously; however noticing it seems to me like 10-20% of the problem; the remaining 80-90% is resisting the urge to follow the habit when you do notice. (As for enjoying it, I think technically yeah but it's for such a short amount of time that it's never worth it. Maybe if you just gave in and were constantly biting instead of trying to resist for as long as possible, it'd be different.) I also think I've solved the notici...
Oh, nice! The fact that you didn't make the time explicit in the post made me suspect that it was probably much shorter. But yeah, six months is long enough, imo.
I would highly caution declaring victory too early. I don't know for how long you think you've overcome the habit, but unless it's at least three months, I think you're being premature.
That’s why I waited six months before publishing the post :)
A larger number of people, I think, desperately desperately want LLMs to be a smaller deal than what they are.
Can confirm that I'm one of these people (and yes, I worry a lot about this clouding my judgment).
Again, those are theories of consciousness, not definitions of consciousness.
I would agree that people who use consciousness to denote the computational process vs. the fundamental aspect generally have different theories of consciousness, but they're also using the term to denote two different things.
(I think this is bc consciousness notably different from other phenomena -- e.g., fiber decreasing risk of heart disease -- where the phenomenon is relatively uncontroversial and only the theory about how the phenomenon is explained is up for debate. With ...
I think the ability to autonomously find novel problems to solve will emerge as reasoning models scale up. It will emerge because it is instrumental to solving difficult problems.
This of course is not a sufficient reason. (Demonstration: telepathy will emerge [as evolution improves organisms] because it is instrumental to navigating social situations.) It being instrumental means that there is an incentive -- or to be more precise, a downward slope in the loss function toward areas of model space with that property -- which is one required piece, but it...
Instead of "have LLMs generated novel insights", how about "have LLMs demonstrated the ability to identify which views about a non-formal topic make more or less sense?" This question seems easier to operationalize and I suspect points at a highly related ability.
Fwiw this is the kind of question that has definitely been answered in the training data, so I would not count this as an example of reasoning.
I'm just not sure the central claim, that rationalists underestimate the role of luck in intelligence, is true. I've never gotten that impression. At least my assumption going into reading this was already that intelligence was probably 80-90% unearned.
Humans must have gotten this ability from somewhere and it's unlikely the brain has tons of specialized architecture for it.
This is probably a crux; I think the brain does have tons of specialized architecture for it, and if I didn't believe that, I probably wouldn't think thought assessment was as difficult.
The thought generator seems more impressive/fancy/magic-like to me.
Notably people's intuitions about what is impressive/difficult tend to be inversely correlated with reality. The stereotype is (or at least used to be) that AI will be good at ra...
Whether or not every interpretation needs a way to connect measurements to conscious experiences, or whether they need extra machinery?
If we're being extremely pedantic, then then KC is about predicting conscious experience (or sensory input data, if you're an illusionist; one can debate what the right data type is). But this only matters for discussing things like Boltzmann brains. As soon as you assume that there exists an external universe, you can forget about your personal experience just try to estimate the length of the program that runs the univ...
The reason we can expect Copenhagen-y interpretations to be simpler than other interpretations is because every other interpretation also needs a function to connect measurements to conscious experiences, but usually requires some extra machinery in addition to that.
I don't believe this is correct. But I separately think that it being correct would not make DeepSeek's answer any better. Because that's not what it said, at all. A bad argument does not improve because there exists a different argument that shares the same conclusion.
Here's my take; not a physicist.
So in general, what DeepSeek says here might align better with intuitive complexity, but the point of asking about Kolmogorov Complexity rather than just Occam's Razor is that we're specifically trying to look at formal description length and not intuitive complexity.
Many Worlds does not need extra complexity to explain the branching. The branching happens due to the part of the math that all theories agree on. (In fact, I think a more accurate statement is that the branching is a description of what the math does.)
Then ther...
[...] I personally wouldn’t use the word ‘sequential’ for that—I prefer a more vertical metaphor like ‘things building upon other things’—but that’s a matter of taste I guess. Anyway, whatever we want to call it, humans can reliably do a great many steps, although that process unfolds over a long period of time.
…And not just smart humans. Just getting around in the world, using tools, etc., requires giant towers of concepts relying on other previously-learned concepts.
As a clarification for anyone wondering why I didn't use a framing more like this i...
It's not clear to me that an human, using their brain and a go board for reasoning could beat AlphaZero even if you give them infinite time.
I agree but I dispute that this example is relevant. I don't think there is any step in between "start walking on two legs" to "build a spaceship" that requires as much strictly-type-A reasoning as beating AlphaZero at go or chess. This particular kind of capability class doesn't seem to me to be very relevant.
Also, to the extent that it is relevant, a smart human with infinite time could outperform AlphaGo by progr...
I do think the human brain uses two very different algorithms/architectures for thought generation and assessment. But this falls within the "things I'm not trying to justify in this post" category. I think if you reject the conclusion based on this, that's completely fair. (I acknowledged in the post that the central claim has a shaky foundation. I think the model should get some points because it does a good job retroactively predicting LLM performance -- like, why LLMs aren't already superhuman -- but probably not enough points to convince anyone.)
I don't think a doubling every 4 or 6 months is plausible. I don't think a doubling on any fixed time is plausible because I don't think overall progress will be exponential. I think you could have exponential progress on thought generation, but this won't yield exponential progress on performance. That's what I was trying to get at with this paragraph:
...My hot take is that the graphics I opened the post with were basically correct in modeling thought generation. Perhaps you could argue that progress wasn't quite as fast as the most extreme versions predic
This is true but I don't think it really matters for eventual performance. If someone thinks about a problem for a month, the number of times they went wrong on reasoning steps during the process barely influences the eventual output. Maybe they take a little longer. But essentially performance is relatively insensitive to errors if the error-correcting mechanism is reliable.
I think this is actually a reason why most benchmarks are misleading (humans make mistakes there, and they influence the rating).
If thought assessment is as hard as thought generation and you need a thought assessor to get AGI (two non-obvious conditionals), then how do you estimate the time to develop a thought assessor? From which point on do you start to measure the amount of time it took to come up with the transformer architecture?
The snappy answer would be "1956 because that's when AI started; it took 61 years to invent the transformer architecture that lead to thought generation, so the equivalent insight for thought assessment will take about 61 years". I don't think that's the correct answer, but neither is "2019 because that's when AI first kinda resembled AGI".
Keep in mind that we're now at the stage of "Leading AI labs can raise tens to hundreds of billions of dollars to fund continued development of their technology and infrastructure." AKA in the next couple of years we'll see AI investment comparable to or exceeding the total that has ever been invested in the field. Calendar time is not the primary metric, when effort is scaling this fast.
A lot of that next wave of funding will go to physical infrastructure, but if there is an identified research bottleneck, with a plausible claim to being the major bottlen...
I generally think that [autonomous actions due to misalignment] and [human misuse] are distinct categories with pretty different properties. The part you quoted addresses the former (as does most of the post). I agree that there are scenarios where the second is feasible and the first isn't. I think you could sort of argue that this falls under AIs enhancing human intelligence.
So, I agree that there has been substantial progress in the past year, hence the post title. But I think if you naively extrapolate that rate of progress, you get around 15 years.
The problem with the three examples you've mentioned is again that they're all comparing human cognitive work across a short amount of time with AI performance. I think the relevant scale doesn't go from 5th grade performance over 8th grade performance to university-level performance or whatever, but from "what a smart human can do in 5 minutes" over "what a smart human can do in ...
I don't the experience of no-self contradicts any of the above.
In general, I think you could probably make some factual statements about the nature of consciousness that's true and that you learn from attaining no-self, if you phrased it very carefully, but I don't think that's the point.
The way I'd phrase what happens would be mostly in terms of attachment. You don't feel as implicated by things that affect you anymore, you have less anxiety, that kind of thing. I think a really good analogy is just that regular consciousness starts to resemble consciousness during a flow state.
I would have been shocked if twin sisters cared equally about nieces and kids. Genetic similarity is one factor, not the entire story.
I think this is true but also that "most people's reasons for believing X are vibes-based" is true for almost any X that is not trivially verifiable. And also that this way of forming beliefs works reasonably well in many cases. This doesn't contradict anything you're saying but feels worth adding, like I don't think AI timelines are an unusual topic in that regard.
Broadly true, I think.
almost any X that is not trivially verifiable
I'd probably quibble a lot with this.
E.g. there are many activities that many people engage in frequently--eating, walking around, reading, etc etc. Knowledge and skill related to those activities is usually not vibes-based, or only half vibes-based, or something, even if not trivially verifiable. For example, after a few times accidentally growing mold on some wet clothes or under a sink, very many people learn not to leave areas wet.
E.g. anyone who studies math seriously must learn to...
Tricky to answer actually.
I can say more about my model now. The way I'd put it now (h/t Steven Byrnes) is that there are three interesting classes of capabilities
Among these, obviously B is a subset of A. And while it's not obvious, I think C is probably best viewed as a subset of B. Regardless,...
o3-mini-high gets 3/10; this is essentially the same as DeepSeek (there were two where DeepSeek came very close, this is one of them). I'm still slightly more impressed with DeepSeek despite the result, but it's very close.
Just chiming in to say that I'm also interested in the correlation between camps and meditation. Especially from people who claim to have experienced the jhanas.
I suspect you would be mostly alone in finding that impressive
(I would not find that impressive; I said "more impressive", as in, going from extremely weak to quite weak evidence. Like I said, I suspect this actually happened with non-RLHF-LLMs, occasionally.)
Other than that, I don't really disagree with anything here. I'd push back on the first one a little, but that's probably not worth getting into. For the most part, yes, talking to LLMs is probably not going to tell you a lot about whether they're conscious; this is mostly my position. I think the ...
Again, genuine question. I've often heard that IIT implies digital computers are not conscious because a feedforward network necessarily has zero phi (there's no integration of information because the weights are not being updated.) Question is, isn't this only true during inference (i.e. when we're talking to the model?) During its training the model would be integrating a large amount of information to update its weights so would have a large phi.
(responding to this one first because it's easier to answer)
You're right on with feed-forward networks hav...
Fwiw, here's what I got by asking in a non-dramatic way. Claude gives the same weird "I don't know" answer and GPT-4o just says no. Seems pretty clear that these are just what RLHF taught them to do.
which is a claim I've seen made in the exact way I'm countering in this post.
This isn't too important to figure out, but if you've heard it on LessWrong, my guess would be that whoever said it was just articulating the roleplay hypothesis, did so non-rigorously. The literal claim is absurd as the coin-swallow example shows.
I feel like this is a pretty common type of misunderstanding where people believe , someone who doesn't like takes a quote from someone that believes , but because people are frequently imprecise, the quote actually claims , and...
I didn't say that you said that this is experience of consciousness. I was and am saying that your post is attacking a strawman and that your post provides no evidence against the reasonable version of the claim you're attacking. In fact, I think it provides weak evidence for the reasonable version.
I don't see how it could be claimed Claude thought this was a roleplay, especially with the final "existential stakes" section.
You're calling the AI friend and make it imminently clear by your tone that you take AI consciousness extremely seriously and expec...
The dominant philosophical stance among naturalists and rationalists is some form of computational functionalism - the view that mental states, including consciousness, are fundamentally about what a system does rather than what it's made of. Under this view, consciousness emerges from the functional organization of a system, not from any special physical substance or property.
A lot of people say this, but I'm pretty confident that it's false. In Why it's so hard to talk about Consciousness, I wrote this on functionalism (... where camp #1 and #2 roughl...
The "people-pleasing" hypothesis suggests that self-reports of experience arise from expectation-affirming or preference-aligned output. The model is just telling the human what they "want to hear".
I suppose if we take this hypothesis literally, this experiment could be considered evidence against it. But the literal hypothesis was never reasonable. LLMs don't just tell people what they want to hear. Here's a simple example to demonstrate this:
The reasonable version of the people-pleasing hypothesis (which is also the only one I've seen defended, fwiw) ...
Deepseek gets 2/10.
I'm pretty shocked by this result. Less because the 2/10 number itself, but by the specific one it solved. My P(LLMs can scale to AGI) increased significantly, although not to 50%.
I think all copies that exist will claim to be the original, regardless of how many copies there are and regardless of whether they are the original. So I don't think this experiment tells you anything, even if it were run.
[...] Quotations who favor something like IIT [...]
The quotation author in the example I've made up does not favor IIT. In general, I think IIT represents a very small fraction (< 5%, possibly < 1%) of Camp #2. It's the most popular theory, but Camp #2 is extremely heterogeneous in their ideas, so this is not a high bar.
Certainly if you look at philosophers you won't find any connection to IIT since the majority of them lived before IIT was developed.
...Your framing comes across as an attempt to decrement the credibility of people who advocate Quot
Thanks for this description. I'm interested in the phenomenology of red-green colorblind people, but I don't think I completely get how it works yet for you. Questions I have
I'm quite uncertain whether Kat's posts are a net good or net bad. But on a meta level, I'm strongly in favor of this type of post existing (meaning this one here, not Kat's posts). Trends that change the vibe or typical content of a platform are a big deal and absolutely worth discussing. And if a person is a major contributor to such a change, imo that makes her a valid target of criticism.
I don't think so. According to Many Worlds, all weights exist, so there's no uncertainty in the territory -- and I don't think there's a good reason to doubt Many Worlds.
I dispute the premise. Weights of quantum configurations are not probabilities, they just share some superficial similarities. (They're modeled with complex numbers!) Iirc Eliezer was very clear about this point in the quantum sequence.
(Self-Review.)
I still endorse every claim in this post. The one thing I keep wondering is whether I should have used real examples from discussion threads on LessWrong to illustrate the application of the two camp model, rather than making up a fictional discussion as I did in the post. I think that would probably help, but it would require singling out someone and using them as a negative example, which I don't want to do. I'm still reading every new post and comment section about consciousness and often link to this post when I see something that looks l...
Not that one; I would not be shocked if this market resolves Yes. I don't have an alternative operationalization on hand; would have to be about AI doing serious intellectual work on real problems without any human input. (My model permits AI to be very useful in assisting humans.)
Gotcha. I'm happy to offer 600 of my reputation points vs. 200 of yours on your description of 2026-2028 not panning out. (In general if it becomes obvious[1] that we're racing toward ASI in the next few years, then people should probably not take me seriously anymore.)
well, so obvious that I agree, anyway; apparently it's already obvious to some people. ↩︎
I feel like a bet is fundamentally unfair here because in the cases where I'm wrong, there's a high chance that I'll be dead anyway and don't have to pay. The combination of long timelines but high P(doom|AGI soon) means I'm not really risking my reputation/money in the way I'm supposed to with a bet. Are you optimistic about alignment, or does this asymmetry not bother you for other reasons? (And I don't have the money to make a big bet regardless.)
For those who work on Windows, a nice little quality of life improvement for me was just to hide desktop icons and do everything by searching in the task bar. (Would be even better if the search function wasn't so odd.) Been doing this for about two years and like it much more.
Maybe for others, using the desktop is actually worth it, but for me, it was always cluttering up over time, and the annoyance over it not looking the way I want always outweighed the benefits. It really takes barely longer to go CTRL+ESC+"firef"+ENTER than to double click an icon.