This is the "corrigibility tag" referenced in this post, right?
Paul Christiano made this comment on the original:
I found this useful as an occasion to think a bit about corrigibility. But my guess about the overall outcome is that it will come down to a question of taste. (And this is similar to how I see your claim about the list of lethalities.) The exercise you are asking for doesn't actually seem that useful to me. And amongst people who decide to play ball, I expect there to be very different taste about what constitutes an interesting idea or useful contribution.
and this seems to have verified, at least the "matter of taste" part. In my quick estimation, Eliezer's list doesn't seem nearly as clearly fundamental as the AGI Ruin list, and most of the difference between this and Jan Kulveit's and John Wentworth's attempts seems to be taste. It doesn't look like one list is clearly superior except that it forgot to mention a couple of items, or anything like that.
Exercise: Think about what failure modes each of these defends against, write them out in detail, and opine about how likely these failure modes are. Add some corrigibility properties of your own.
They're delaying their ascension, in dath ilan, because they want to get it right. Without any Asmodeans needing to torture them at all, they apply a desperate unleashed creativity, not to the problem of preventing complete disaster, but to the problem of not missing out on 1% of the achievable utility in a way you can't get back. There's something horrifying and sad about the prospect of losing 1% of the Future and not being able to get it back.
Is dath ilan worried about constructing an AGI that makes the future 99% as good as it could be, or a 1% chance of destroying all value of the future?
I had assumed the first -- they're afraid of imperfect-values lock-in. I think it's the "not to the problem of preventing complete disaster" phrase that tipped me off here.
- Unpersonhood. The Thing shall not have qualia - not because those are unsafe, but because it's morally wrong given the rest of the premise, and so this postulate serves a foundation for everything that follows.
Wow. Either qualia are just an automatic emergent property that all intelligent systems have, or they are some sort of irrelevant illusion ... or p-zombies are possible. AGI will be people too, and this is probably unavoidable, so get over it.
Wow. Either qualia are just an automatic emergent property that all intelligent systems have, or they are some sort of irrelevant illusion ... or p-zombies are possible.
Or the relationship (which no-one on this Earth knows) between highly capable machines and qualia has gears. There is a specific way that qualia arise, and it may very well be that highly capable machines of the sort that the dath ilanis want to build can be designed without qualia.
Emergence and epiphenomenalism have no gears.
By automatic emergent property I meant something like "qualia emerge from the attentive focus and cross reference of some isolated sensory percept or thought routed one to many across many modules, producing a number of faint subverbal associations that linger for some time in working memory", and thus is just a natural expected side effect of any brain-like AGI, and thus probably any reasonable DL based AGI (ie transformers could certainly have this property).
If you can build an AGI without qualia, then humans are unlikely to have quaalia.
I think Eliezer's belief (which feels plausible although I'm certainly still confused about it), is that qualia comes about when you have an algorithm that models itself modeling itself (or, something in that space).
I think this does imply that there are limits on what you can have an intelligent system do without having qualia, but seems like there's a lot you could have it do if you're careful about how to break it into subsystems. I think there's also plausible control over what sorts of qualia it has, and at the very least you can probably design that to avoid it experiencing suffering in morally reprehensible ways.
I think my argument was misunderstood, so I'll unpack.
There are 2 claims here, both are problematic 1.) 'qualia' comes about from some brain feature (ie level of self-modeling recursion) 2.) only thinking systems with this special 'qualia' deserve personhood
Either A.) the self-modelling recursion thing is actually a necessary/useful component of or unavoidable side-effect of intelligence, or B.) some humans probably don't have it: because if A.) is false, then it is quite unlikely that evolution would conserve the feature uniformly. Thus 2 is problematic as it implies not all humans have the 'qualia'.
If this 'qualia' isn't an important feature or necessary side effect, then in the future we can build AGI in sims indistinguishable from ourselves, but lacking 'qualia', and nobody would notice this lacking. Thus it is either an important feature or necessary side effect or we have P-zombies (ie belief in qualia is equivalent to accepting P-zombies).
"only thinking systems with this special 'qualia' deserve personhood"
I'm not sure if this is cruxy for your point, but the word "deserves" here has a different type signature from the argument I'm making. "Only thinking systems with qualia are capable of suffering" is a gearsy mechanistic statement (which you might combine with moral/value statements of "creating things that can suffer that have to do what you say is bad". The way you phrased it skipped over some steps that seemed potentially important)
I think I disagree with your framing on a couple levels:
I'm not 100% sure I get what claim you're making though or exactly what the argument is about. But I think I'd separately be willing to bite multiple bullets you seem to be pointing at.
(Based on things like 'not all humans have visual imagination', I think in fact probably humans vary in the quantity/quality of their qualia, and also people might vary over time on how they experience qualia. i.e. you might not have it if you're not actively paying attention. It still seems probably useful to ascribe something personhood-like to people. I agree this has some implications many people would find upsetting.)
i.e. there's more than one reason to give someone moral personhood
Sure, but then at that point you are eroding the desired moral distinction. In the original post moral personhood status was solely determined by 'qualia'.
Brain inspired AGI is near, and if you ask such an entity about its 'qualia', it will give responses indistinguishable from a human. And if you inspect it's artificial neurons, you'll see the same familiar functionally equivalent patterns of activity as in biological neurons.
Unrelated, but I don't know where to ask-
Could somebody here provide me with the link to Mad Investor Chaos's discord, please?
I thought part of the point was to have it within the piece itself to encourage that the joiners would have actually read it.
Yes..... except that it was only given once.
I was catching up then; so I didn't want discord access and get all the spoilers. And now I can't find the link.
The second half of this linkpost contains significant planecrash spoilers, up through Book 7, null action.