I'm seeing more sophisticated LLM-slop in the LW moderation queue.
Eight months ago, I wrote "hey, we're getting tons of AI-psychosis'd people, deluded into thinking their crackpot coherence/spiralism/emergence/ChatGPTAwakening experience is true and meaningful. We process like 15-20 of these a day."
Nowadays, we still get some of those, but, a lot less. Instead, I think we often now get a somewhat more sophisticated looking set of AI LLM slop. It's often something like:
"Me and ChatGPT have been working on some ML experiments for months, checking if we get s...
I think it will not reach the standard of a "sufficiently-good-alignment-researcher to really build meaningful conceptual progress" until Near The End.
If it turns out that AIs can make meaningful conceptual progress on alignment soon, but people get discouraged from using AIs for this, or discouraged from publishing their output, that could be a really costly mistake.
I would suggest thinking about how to scale up and improve the evaluation pipeline so it can handle a much greater content volume. For example: My assumption is that you prefer human-writt...
Things that have been successfully-so-far banned before being done (very shallow research, not sure; found w/ gippities and cursorily (ha) sanity-checked):
In crime shows and books they often talk about Means, Motive, and Opportunity... I suspect at least one is missing from each example on your list.
Military Moon Bases. The opportunity requires a well established space program with regular, or at least imminent, Lunar visits. The Means is tremendous amounts of resources. Which diminishes the motive - since the higher the opportunity cost, the higher the returns need to be: what is cheaper to do on the moon than on Earth to such a point where it becomes a profitable venture?
How many of these bans have held af...
(Beta announcement to get some testing/feedback before I post this to main. Please report bugs/UX friction/perf issues and feature requests, here if you want others to see and discuss, or on Github if minor.)
I present a browser userscript to help keep up with LW/EAF content (it's a way to view all recent comments/posts, with a lot of helpful features), and to save, read, and search a user's entire LW/EAF output.
Quick Start: Install Tampermonkey first (links for Chrome, Firefox, Edge, Safari), then cli...
Thank you, I've taken your suggestion and removed all uses of Math.random().
There are two common models of space colonization people sometimes allude to, neither of which I think is particularly likely.
Model 1 (“normal colonization”) is that space colonization will look something like Earth colonization, e.g. the way the first humans to expand to the Polynesian islands. So your boat (rover/ship/probe) hops to one island (planet), you build up a civilization, and then you send your probes onwards to the next couple of nearby planets, maybe saving up a bunch of resources if you've colonized nearby star systems (eg your galaxy) and ...
The furthest away parts of the theoretically reachable universe is 16-18 billion years away, so a 100 years delay is worth it if you can just increase the speed by 1/100 millionth of c
Why is there a tradeoff? Why don't you launch your early comparatively technologically-unsophisticated probes as soon as you can, and then, if you develop faster probes, also launch those if you calculate that they could catch up to the ones that you already launched?
It's not like the resources spent on early probes trade off appreciably with technological development.
Claude doesn’t get it
In all the interactions with AI, there has been one recurring problem that doesn’t seem to go away: they don’t get it. I don’t know how to explain this in any better language than that. I don’t know how to create a “Get It” benchmark. But whenever I talk to Claude, ChatGPT, Gemini, or any other model about a concept, the longer the interaction lasts, the more I get the sense that it doesn’t really “get it”. In this way, I think AI skeptics are actually pointing to something real when they say they’re not “real intelligence”. A lot of ...
I think about the canonical Reality Has a Surprising Amount of Detail post a lot when trying to automate tasks with LLMs. In particular, any given task has many granular details, most of which don’t come to mind before making contact with reality oneself. The most common failure mode I run into is failing to specify various details that I hadn’t even *realized* were relevant things that could be messed up, before the model encountered the situation and messed them up. This also seems relevant when thinking about the transition from verifiable math and codi...
I must admit a poverty of imagination; I can’t see how it can be automated. That would be amazing if it could be.
However, the circumstances of each problem or LLM request are always so unique that outside of certain vague guardrails that apply to all problem solving/advice giving (In my experience these take the form of the questions: What have you tried already? Why did you try that way?/what did you expect to happen? What happened instead?). I see the ritual as attempting to explain why this situation is really unique and different – which seems to me to...
A fun riddle I was shocked to see the gippities solve without extended thinking or much yapping, even. I gave up on the third!
Here's the riddle, as stated to Opus 4.6 on an empty context window. "Consider single-word-name countries. An inclusive pair is when the name of one country is contained in another wholly. There are three such pairs. Find them."
I was surprised not only by how quickly it solved it, but also by the lack of thinking tokens. Gemini 3.1 Fast also did it. And by the unusual order in which the solutions were produced, in the exact re...
PSA: Anthropic models don't seem to particularly privilege the explicit thinking field. This makes reinforcement spillover—where training on a model's outputs generalizes to the CoT, making it appear safer—more likely.
While Anthropic models do have an separate explicit thinking field, they don't really use thinking that differently from outputs and aren't that dependent on the thinking field. Sometimes they'll just do their thinking in the output field, the way they talk in the thinking field isn't very distinct from how they talk in outputs, and I believe...
This comment thread on 1a3orn’s post has a collection of various model’s exhibiting degenerate language usage + Jozdien’s paper (which has since come out: Reasoning Models Sometimes Output Illegible Chains of Thought) I think are all strong evidence that you don’t get human legible english by default from outcome based RL.
I'm somewhat skeptical of that paper's interpretation of the observations it reports, at least for R1 and R1-Zero.
I have used these models a lot through OpenRouter (which is what Jozdien used), and in my experience:
it feels rude to talk about specifics about other people. at a broad level, there are some people I've gained a lot of respect for. it's easy for people to say they care about safety, so I don't weigh that very heavily. but now I know who's willing to step up and take actions in a crisis. and conversely too.
Random note: Congressman Brad Sherman just held up If Anyone Builds It, Everyone Dies in a Congressional hearing and recommended it, saying (rough transcript, might be slight paraphrase): "they're [the AI companies] not really focused on the issue raised by this book, which I recommend, but the title tells it all, If Anyone Builds It Everyone Dies"
I think this is a clear and unambiguous example of the theory of change of the book having at least one success -- being an object that can literally be held and pointed to by someone in power.
Mostly, I just need to decide to spend a block of time writing, instead of doing other work. But aside from that, I have less need of co-writers, and more of a need for audiences that I'm actively writing for and engaging with (who those audiences are will be different for the many things that I have in the backlog).
I feel confused about the rationalists online. It feels like I tolerate you online because I AGREE with ai risk and know many of you personally. However if I encountered yall in the wilds of the internet I would not be moved. I would perhaps bounce off. And a younger me would maybe have even made you my enemy. I’m not sure how to fix this. The space seems void of a certain type of pragmatic femininity and aesthetic sensibility
*adding more specifics:
fyi, it would have been a very small update in favor, under the Likelihood Principle.
I would rate the observation "my wife has the initials LLM" as being slightly more common assuming a simulation hypothesis than assuming a non-simulation hypothesis.
Bernie Sanders has released a video on x-risk featuring a discussion with Eliezer, Nate, Daniel Kokotajlo, and Jeffrey Ladish. An excerpt from it appears to be blowing up on twitter.
There is currently a post on reddit https://www.reddit.com/r/singularity/comments/1rktwmm/i_study_whether_ais_can_be_conscious_today_one/
and it shows an LLM emailing an AI consciousness researcher asking about it's own consciousness. How legit can this be? If it is actually legit it's kind of mind blowing and deserves a lot of alarms in many labs.
What do you mean by 'legit'?
can someone explain why Simulator theory was written about end of 2022, people semi forgot or moved on, then Personas made a big splash recently?
Can't speak for everybody but when I read simulators I was like "yeah obviously GPTs are generative models not agents, duh, why do people need an entire post to tell them that" and didn't really expect the thing where agents are like "characters" in the world model to scale to high levels of capability. That seeming to happen raised the salience of the ideas a lot for me.
A quick note on various alignment affordances that the model personas research agenda might offer. I'm interested in takes on how useful people think each of these is.
Thanks! Much appreciated.
I think there are two meanings of robustness here:
Why is there so much emphasis on OpenAI and its arrangement with the Department of War relative to GDM's and xAI's, and is that rational? While OpenAI seems like it's behaving much worse than Anthropic, it seems arguably better than those other two, and I'm worried this is a case of it being punished for doing more than nothing (or rather, that some of the ire currently focused on OpenAI should focus on them).
Agree that OpenAI's and Department of War's comms about their arrangement was weird, sketchy, and triggering (but not necessarily worse than complete silence in my mind)
I am criticizing OpenAI not just because of the terms of their contract, but because they previously said that they had the same redlines as Anthropic, and then not 2 days latter, signed a contract abandoning those redlines, while quite transparently lying about whether the redlines were protected.
That is bad behavior, and I'm glad they're getting pushback about it. When you claim to stand for principles, you're taking on additional social cost when you abandon those principles.
I wouldn't care nearly as much, if they had accepted the the contract that they...
Looking back at the self other overlap research, it seems that there is a way to simplify the problem. Instead of trying to find a "self" representation and an "other" representation, we can instead do something like "create a complexity penalty for internal representations". In other words, simply require that the system reduce the complexity of the entire internal residual stream/hidden state/activation over time, while maintaining its predictive/world modelling accuracy. This requires it to essentially "share representations" for things that have shared...