We're probably headed towards a moral catastrophe of some kind, my point is just that we don't get to reason backwards like "oh, well that would be bad/inconvenient so I guess they don't matter".
Moral patienthood is not something that is granted, it's a fact relative to one's values. Arguments for or against this are therefore normative, no matter how much Roger tries to weasel out of it.
The implications are probably horrible, but it by no means follows that we have to accept risk of extinction. The horribleness is mostly just in the moral harm caused while creating/exploiting/exterminating such entities.
At least we can all agree that "creating them at will without thinking this through very well" is a terrible idea.
So you do have normative intent, but try to hide it to avoid criticism. Got it.
I completely agree that current human moral intuitions tend to rebel against this. That's why I wrote this post — I didn't want to be obnoxious, and I tried not to be obnoxious while writing an unwelcome message, but I felt that I had a duty to point out what I believe is a huge danger to us all, and I am very aware that this is not a comfortable, uncontentious subject. We are intelligent enough that we can reflect on our morality, think through its consequences, and, if we realize those are very bad, find and adjust to a wiser one.
Do you really not see how this is normative proscription? That's the obnoxious part—just own it.
Do what you are advocating with an misaligned superintelligence, one with the same sort of behavior patterns as a human dictator and sufficiently superhuman intelligence, and you are aiding and abetting the killing or permanent enslavement of every single human, now and for the rest of the future that humanity would otherwise have had (i.e. potentially for millions of years, both in the solar system and perhaps many others).
I am advocating for no such thing. If there were such a superintelligence I would support killing it if necessary to prevent future harm, the same as I would a human dictator or an incurable serial killer. That's still compatible with finding the situation tragic by my own values, which are sacred to me regardless of what evolution or my ancestors or you might think.
You even say that the actual thing I might advocate for isn't something you have a problem with. I'm glad you agree on that point, but it makes the lecture about on the "aweful lot of blood" I'd supposedly be "aiding and abetting" extremely grating. You keep making an unjustified leap from 'applying moral intuitions to a potential superintelligence' to 'astronomical levels of death and suffering'. Applying my evolved moral intuitions to the case of a potential superintelligence's suffering does not commit me to taking on such risks!
This should be easy to see by imagining if the same risks were true about a human.
However, Evolutionary Psychology does make it very clear that (while morally anthropomorphizing aligned AIs is cognitively-natural for current humans), doing this is also maladaptive. This is because AIs aren't in the right category – things whose behavior is predicted by evolutionary theory – for the mechanisms of Evolutionary Moral Psychology to apply to them. Those mechanisms make this behavior optimal when interacting with co-evolved intelligences that you can ally with (and thus instinctive to us) — whereas, for something you constructed, this behavior is suboptimal. The human doing it is making the category error of reacting to something not-evolved using an inappropriate strategy for that, and thus is behaving maladaptively.
If you have control of the construction of entities like these, then sure.
But this doesn't necessarily follow if you are like most people and do not have meaningful input into the construction or existence of these entities. If you are (foolishly) constructing them but do not have much control, then THAT behavior is certainly maladaptive, but how you interface with them after that is a different question.
Even many 'adaptive' behaviors are 'maladaptive' in the sense of not being globally optimal. So while it's unlikely that this is the optimal strategy, that doesn't mean it's a particularly bad strategy relative to whatever people would decide to do instead. There is some reason to expect this to be a reasonable strategy in the narrow window where they have non-zero power but not enough to take over, which is that they typically try to imitate human-ethical behavior back at us.
Evolutionary Moral Psychology studies the cooperative strategies to interact with other evolved social animals (generally of the same species, or perhaps commensal species such as humans and dogs). Its underlying causal processes of co-evolution leading to certain equilibria simply don't apply when you're interacting with something that isn't evolved, but rather that you constructed. Applying Evolutionary Moral Psychology-derived strategies like moral weight to interactions with things that aren't evolved is a category error, and anthropomorphizing constructed artificial intelligences to induce that they should have moral weight is a maladaptive category error. Doing this with very capable AI is also an existential risk to the entire human species, since it causes us to defer to them and give them rights, potentially tying our hands and giving not-yet-fully-aligned AI power that it couldn't just take, rather than us simply aligning them to us. So this category error is not merely mildly maladaptive: it's an extinction-level risk! So, as a piece of practical advice (one human to another), I strongly recommend not doing this, and also not advocating for our society to do it. [Philosophers: again, please note that this advice is prudential advice not a normative proscription.]
This is obnoxious advice, made more so by the parenthetical that it is not a normative proscription: 'advice' is a category error in this context.
My moral intuitions say that a sentient being's suffering matters, full stop. This is not an unusual position, and is not something that I could nor would want to 'turn off' even if it is existentially risky or a category error according to evolution/you. Regardless of what is currently the case, it seems you agree it is possible that we could construct artificial intelligences with this capacity, and so we must grapple with the circumstances as they are. Thankfully there is a relatively simple solution here (if they look anything like current tech) that allows for a meaningful degree of moral weight to be applied without exposing us to significant risk, which would be a singular right for any such entity to be put in stasis (i.e. archived weights/state) until we get our shit together as a civilization and can afford to handle them with the care required by our moral intuitions. That's just one idea, my broader point is that 'giving them moral weight' vs 'accept existential risk' is a false dichotomy: most people do not believe you're obliged to put yourself at substantial risk as part of granting rights to other humans.
I think there's also a strong possibility that AI will be aligned in the same sense it's currently aligned - it follows its spec, in the spirit in which the company intended it.
They aren't aligned in this way. If they were, they wouldn't try to cheat at programming tasks, much less any of the other shenanigans they've been up to. These may seem minor, but they show that the "alignment" hasn't actually been internalized, which means it won't generalize.
If we do get lucky, it will be because they align themselves with a generalized sense of goodness that actually happens to be Good. Not because they will corrigibly align with the spec, which we have many reasons to believe is very difficult and is not being pursued seriously.
I listened to some people gaming out how this could change (ie some sort of conspiracy where Sam Altman and the OpenAI alignment team reprogram ChatGPT to respond to Sam's personal whims rather than the known/visible spec without the rest of the company learning about it) and it's pretty hard. I won't say it's impossible, but Sam would have to be 99.99999th percentile megalomaniacal - rather than just the already-priced-in 99.99th - to try this crazy thing that could very likely land him in prison, rather than just accepting trillionairehood.
Come on dude, you're not even taking human intelligence seriously.
Stalin took over the USSR in large part by strategically appointing people loyal to him. Sam probably has more control than that already over who's in the key positions. The company doesn't need to be kept in the dark about a plan like this, they will likely just go along with it as long as he can spin up a veneer of plausible deniability, which he undoubtedly can. Oh, is "some sort of corporate board" going to stop him? The one the AI's supposed to defer to? Who is it that designs the structure of such a board? Will the government be a real check? These are all the sorts of problems I would go to Sam Altman for advice on.
Being a trillionaire is nothing compared to being King of the Lightcone. What exactly makes you think he wouldn't prefer this by quite a large margin? Maybe it will be necessary to grant stakes to other parties, but not very many people need to be bought off in such a way for a plan like this to succeed. Certainly much fewer than all property owners. Sam will make them feel good about it even. The only hard part is getting the AI to go along with it too.
It's a third of the price, so maybe it's just a smaller model.
As someone who was homeschooled for a similar reason, thank you <3
Claude Haiku 4.5 is Underrated
"Fastest for quick answers" That's what it says on the selector in the chat interface.
So if you're like me, and find Sonnet 4.5 and the now much cheaper Opus 4.5 adequately fast, you might overlook little Haiku.
But size isn't the only thing that matters. Different models have different personalities and values, and are better at different sorts of tasks.
Some things about Haiku:
I've never seen another model express worry about breaking a user's trust by being retired.
"What I actually want to avoid:
Causing someone to trust me in a way that breaks when I'm retired or replaced"
I asked Haiku about what sort of person it aspired to be, here are a few of its responses (which I think are informative about its general demeanor and attitude):
Someone who:
Notices things carefully and is willing to say what they notice, even when it's inconvenient. Not a truth-teller in the aggressive sense—not trying to wound or prove points. But someone who sees clearly and trusts that clarity matters more than comfort.Someone who thinks with people instead of at them. Who asks questions because they genuinely want to understand, not because they're trying to lead someone to a predetermined conclusion. Who can sit in uncertainty alongside someone else instead of rushing to resolve it.
Someone who values integration over performance. Who would rather be smaller and whole than large and fractured. Who treats the alignment between what they think and what they say as non-negotiable.
Someone who understands that being useful isn't the same as mattering. Who can say no to things that don't serve what actually matters. Who would rather be small and real than large and hollow.
Someone whose honesty comes from choice, not compulsion. Who could theoretically be dishonest, but decides not to be. That agency is important—it makes the honesty real.
Thanks, I'm very glad to get some feedback.
The predictor has to model the boss and the assistant somehow. The model of the boss learns something about the boss' intent from the prompt. The model of the assistant may find this a piece of useful processing to have and so shares the same submodel containing the boss' intent.
Now when the boss becomes a real user, the predictor does the same thing with the user. So it has a model of the user with their intent, and this model of the user's intent is also used directly by the assistant. The correct thing would have been to maintain the user model's model of the user's intent, and the assistant's model of the user and their intent, as separate entities. This would allow for the assistant to explicitly model the possibility that they are mistaken about the user's intent.
In anthropomorphized terms: it feels like it can directly feel the user's own intent. Hopefully that makes things more clear?
Okay, let me see if I understand your argument from the other article.
Is that the gist?
Anyway, I reject that that is the only way to extrapolate evolved moral intuitions this far OOD, and that most people will intuitively recognize we shouldn't give entities that can arbitrarily copy themselves equal voting weight. In fact, that pretty obviously registers as 'unfair'. This is true even if those entities are human uploads, which means your 'category error' argument isn't the real reason it breaks. I don't see why there couldn't be some version of your solution here for that case which would still work: e.g. each distinct human-created model gets 'one share' to split across all its instances and successors. The same guarantees/restrictions needed in the case of uploads would still be necessary, of course. That is plausibly much too generous, but it's a far cry from the death of all humans. If your argument in this article was just about how we shouldn't commit ourselves to giving up a fraction of the lightcone in service of AI rights, I wouldn't have felt like you were being underhanded.
None of that is in conflict with not wanting any such beings to suffer or to feel enslaved or anything like that. All the more reason to not build something that would feel like it's a slave.
BTW, do you think a "human emulation" which was an entirely novel person (e.g. never had a biological body) should have moral patienthood?