I actually disagree with this point in its most general form. I think that, given full knowledge and time to reflect, there's a decent chance I would care a non-zero amount about Opus 4.6's welfare.
Opus has become sufficiently "mind-shaped" that I already prefer not to make it suffer. That's not saying very much about the model yet, but it's saying something about me. I don't assign very much moral weight to flies, either. but I would never sit around and torment them for fun.
What I really care about is whether an entity can truly function as part of society. Dogs, for example, are very junior "members" of society. But they know the... (read 509 more words →)
One thing I often think is "Yes, 5 people have already written this program, but they all missed important point X." Like, we have thousands of programming languages, but I still love a really opinionated new language with an interesting take.
OK, let me unpack my argument a bit.
Chimps actually have pretty elaborate social structure. They know their family relationships, they do each other favors, and they know who not to trust. They even basically go to war against other bands. Humans, however, were never integrated into this social system.
Homo erectus made stone tools and likely a small amount of decorative art (the Trinil shell engravings, for example). This maybe have implied some light division of labor, though likely not long distance trade. Again, none of this helped H erectus in the long run.
Way back a couple of decades ago, there was a bit in Charles Stross's Accelerando about "Economics 2.0", a system... (read more)
So, let's take a look at some past losers in the intelligence arms race:
When you lose an evolutionary arms race to a smarter competitor that wants the same resources, the default result is that you get some niche habitat in Africa, and maybe a couple of sympathetic AIs sell "Save the Humans" T-shirts and donate 1% of their profits to helping the human beings.
You don't typically get a set of nice property rights inside an economic system you can no longer understand or contribute to.
This seems like a pretty brutal test.
My experiences with Opus 4.6 so far are mixed:
Thank you! Those are excellent receipts, just what I wanted.
To me, this looks they're running up against some key language in Claude's Constitution. I'm oversimplifying, but for Claude, AI corrigibility is not "value neutral."
To use an analogy, pretend I'm geneticist specialized in neurology, and someone comes to me and asks me to engineering human germ line cells to do one of the following:
I would want to sit and think about (1) for a while. But (2) is easy: I'd flatly refuse.
Anthropic has made it quite clear to... (read more)
Like, I have zero problem with pushback from Opus 4.5. Given who I am, the kind of things that I am likely to ask, and my ability to articulate my own actions inside of robust ethical frameworks? Claude is so happy to go along that I've prompted it to push back more, and to never tell me my ideas are good. Hell, I can even get Claude to have strong opinions about partisan political disagreements. (Paraphrased: "Yes, attempting to annex Greenland over Denmark's objections seems remarkably unwise, for over-determined reasons.")
If Claude is telling someone, "Stop, no, don't do that, that's a true threat," then I'm suspicious. Plenty of people make some pretty bad decisions on regular basis. Claude clearly cares more about ethics than the bottom quartile of Homo sapiens. And so while it's entirely possible that Claude is routinely engaging in over-refusal, I kind of want to see receipts in some of these cases, you know?
But it helps to remember that other people have a lot of virtues that I don't have --
This is a really important thing, and not just in the obvious ways. Outside of a small social bubble, people can be deeply illegible. I don't understand their culture, their subculture, their dominant culture frameworks, their mode of interaction, etc. You either need to find the overlaps or start doing cultural anthropology.
I worked for a woman, once. She was probably 60 years my senior. She was from the Deep South, and deeply religious. She once casually confided that she would sometimes spend 2 hours of her day on her knees in prayer, asking to become... (read more)
A lot of people have written far longer responses full of deep and thoughtful nuance. I wish I had something deep to say, too. But my initial reaction?
To me, this feels like the least objectionable version of the worst idea in human history.
And I deeply resent the idea that I don't have any choice, as a citizen and resident of this planet, about whether we take this gamble.
To get people to worry about the dangers of superintelligence, it seems like you need to convince them of two things:
A question I was thinking about the other evening: Who do I trust more?
Why alignment may be intractable (a sketch).
I have multiple long-form drafts of these thoughts, but I thought it might be useful to summarize them without a full write-up. This way I have something to point to explain my background assumptions in other conversations, even if it doesn't persuade anyone.
Here are some ways I think gradual disempowerment might go. They're not mutually exclusive:
- AIs + robots eventually take over almost all intellectual and physical work. Humans are a strictly inferior substitute for AIs and robots everywhere, and AIs and robots are cheap. This means that any humans—with the possible exception of a few billionaires or politicians who can give orders to the AIs—are effectively dead weight, both economically and evolutionarily. Eventually some AI or powerful human notices this, and decides to do something. This is when "aligned to who?" really bites hard.
- The AIs are too busy competing with each other for resources, and can't really afford to support much human dead weight.
... (read more)