Are models like Opus 4.6 doing a similar thing to o1/o3 when reasoning?
There was a lot of talk about reasoning models like o1/o3 devolving into uninterpretable gibberish in their chains-of-thought, and that these were fundamentally a different kind of thing than previous LLMs. This was (to my understanding) one of the reasons only a summary of the thinking was available.
But when I use models like Opus 4.5/4.6 with extended thinking, the chains-of-thought (appear to be?) fully reported, and completely legible.
I've just realised that I'm not sure what's going on here. Are models like Opus 4.6 closer to "vanilla" LLMs, or closer to o1/o3? Are they different in harnesses like Claude Code? Someone please enlighten me.
I don't think I really get what the objection is?
The way I think about it is (ignoring the meta-anthropic thing) is that if for some reason every human who has ever lived or will live said aloud "I am in the final 95% of humans to be born", then trivially 95% of them would be correct. You are a human, if you say this aloud, there is a 95% chance you are correct, therefore doom.
I understand objections with regard to whether this is the correct reference class, but my understanding is that you think the above logic does not make sense. What am I missing?
I was excited to listen to this episode, but spent most of it tearing my hair out in frustration. A friend of mine who is a fan of Klein told me unprompted that when he was listening, he was lost and did not understand what Eliezer was saying. He seems to just not be responding to the questions Klein is asking, and instead he diverts to analogies that bear no obvious relation to the question being asked. I don't think anyone unconvinced of AI risk will be convinced by this episode, and worse, I think they will come away believing the case is muddled and confusing and not really worth listening to.
This is not the first time I've felt this way listening to Eliezer speak to "normies". I think his writings are for the most part very clear, but his communication skills just do not seem to translate well to the podcast/live interview format.
This is kind of a sidenote and is not meant as an attack or criticism, but was GPT-5 used in the drafting of this post? I say this because I noticed a very heavy use of parentheses.
Dominic Cummings has claimed in a couple interviews now that Hillary Clinton and/or John Kerry called the First Amendment a "historic error which we will fix after the election" in the weeks up to the 2024 election. See for instance this interview (timestamped where he says it). He is clearly implying that this is a direct quote. I'm generally quite sympathetic to Cummings, but I found this very hard to believe.
Indeed, I can't find any evidence of a quote from either Clinton or Kerry remotely like this. There was a CNN interview of Clinton from October 2024 where she called for the repeal of Section 230. There was also an interview with... (read more)
In French if you wanted to say e.g. "This person is my dad", you would say "Cette personne est mon père", so I think using "ma" here would be strongly biasing the model towards female categories of people.
Occasionally when meditating I stumble into a state that afterwards makes me wonder if it was a taste of “enlightenment”
Can you tell me if the following wordsalad approximates any part of what enlightenment feels like to you?
I used to use em-dashes pretty often and have just resigned myself to not using them. At least a dozen times the past couple months I've rewritten a sentence to remove an em-dash. Which sucks!
I think the reason em-dashes became such an LLM tell is because they just weren't that common in pre-LLM writing. Parentheses are (I think?) a lot more common than em-dashes, so I would guess they won't be as reliable a signal of LLM text.
GPT-5 loves parentheses.
At the bottom of this post I've included a response to the prompt "Can you explain the chip export controls to China?". With this prompt, the model uses 11 sets of parentheses in a response of 417 words.
When we append "think hard" to the prompt, we get 36 sets of parentheses for a response of 1314 words.
As a control, we give the same prompt to Claude Sonnet 4 and get 1 set of parentheses for a response of 392 words.
Obviously this is not a scientific or rigorous analysis, just an illustration of a pattern that becomes extremely obvious almost immediately when using GPT-5. This was the first prompt I checked... (read 2431 more words →)
I am confused about why this post on the ethics of eating honey is so heavily downvoted.
It sparked a bunch of interesting discussion in the comments (e.g. this comment by Habryka and the resulting arguments on how to weight non-human animal experiences)
It resulted in at least one interesting top-level rebuttal post.
I assume it led indirectly to this interesting short post also about how to weight non-human experiences. (this might not have been downstream of the honey post but it's a weird coincidence if isn't)
I think the original post certainly had flaws, but the fact that it's resulted in so much interesting and productive discussion and yet has been punished by the karma system seems weird to me.
I’m glad that there are radical activist groups opposed to AI development (e.g. StopAI, PauseAI). It seems good to raise the profile of AI risk to at least that of climate change, and it’s plausible that these kinds of activist groups help do that.
But I find that I really don’t enjoy talking to people in these groups, as they seem generally quite ideological, rigid and overconfident. (They are generally more pleasant to talk to than e.g. climate activists in my opinion, though. And obviously there are always exceptions.)
I also find a bunch of activist tactics very irritating aesthetically (e.g. interrupting speakers at events)
I feel some cognitive dissonance between these two points of view.
Here are a cluster of things. Does this cluster have a well-known name?
Some related concepts: self-fulfilling prophecy, herding, preference falsification
Idea: personal placebo controlled drug trial kits
Motivation: anecdotally, it seems like lots of supplements/nootropics (l theanine, magnesium, melatonin) work very well for some people, not well for others, and very well for a bit before no longer working for yet others. Personally, I have tried a bunch of these and found it hard to distinguish any purported effect from placebo. Clinical trials are also often low quality, and there are plausibly reasons a drug might affect some people a lot and others not so much.
I think it would be super useful to be given 60 indistinguishable pills in a numbered blister pack, half placebo half active, along with some simple online tool to input the pill number along with some basic measures of anxiety/depression/sleep quality, so that you can check how the drug affected you modulo placebo.
I would guess that the market for this would be quite small. But if anyone wants to make this product, I commit to buying at least one!
I have an ADHD dilemma.
TL;DR: I definitely have things wrong with me, and it seems that those things intersect substantially but not completely with "ADHD". I have no idea how to figure these things out without going bankrupt.
In longer form:
There are a couple of examples of people claiming that they played the AI box game as Gatekeeper, and ended up agreeing to let the other player out of the box (e.g. https://www.lesswrong.com/posts/Bnik7YrySRPoCTLFb/i-played-the-ai-box-game-as-the-gatekeeper-and-lost).
The original version of this game as defined by Eliezer involves a clause that neither player will talk about the content of what was discussed, but it seems perfectly reasonable to play a variant without this rule.
Does anyone know of an example of a boxed player winning where some transcript or summary was released afterwards?
I have a weakly held hypothesis that one reason no such transcript exists is that the argument that ends up working is something along the lines... (read more)
This is a video that randomly appeared in my YouTube recommendations, and it's one of the most strange and moving pieces of art I've seen in a long time. It's about animal welfare (?), but I really don't know how to describe it any further. Please watch it if you have some spare time!
Hmm, but when you use these models in the chat interface, you can literally open up the reasoning tab and watch it be generated in real time? It feels like there isn’t enough time here for that reasoning to have been generated by a summarizer