LESSWRONG
LW

All of PavleMiha's Comments + Replies

If Alignment is Hard, then so is Self-Improvement

"It's much easier to find parts of the system that don't affect values than it is to nail down exactly where the values are encoded." - I really don't see why this is true, how can you only change parts that don't affect values if you don't know where values are encoded?

If Alignment is Hard, then so is Self-Improvement

PavleMiha2y10

I guess I don't really see that in myself. If you offered me a brain chip that would make me smarter but made me stop caring for my family I simply wouldn't do it. Maybe I'd meditate to make want to watch less TV, but that's because watching TV isn't really in what I'd consider my "core" desires.

2Richard_Kennaway2y

If I know the change in advance then of course I won’t endorse it. But if I get my smartness upgraded, and as a result of being smarter come to discard some of my earlier values as trash, what then? All changes of beliefs or values feel like I got closer to the truth. It is as if I carry around my own personal lightbulb, illuminating my location on the landscape of possible ideas and leaving everything that differs from it in the darkness of error. But the beacon moves with me. Wherever I stand, that is what I think right. To have correct beliefs, I can attend to the process of how I got there — is it epistemically sound? — rather than complacently observe the beacon of enlightenment centre on whatever the place that I stand. But how shall I know what are really the right values?

If Alignment is Hard, then so is Self-Improvement

PavleMiha2y10

Quite curious to see Eliezer or someone else's point on this subject, if you could point me in the right direction!

2Portia2y

God, this was years and years ago. He essentially argued (recalling from memory) that if humans knew that installing an update would make them evil, but they aren't evil now, they wouldn't install the update, and wondered whether you could implement the same in AI to get AI to refuse intelligence gains if they would fuck over alignment. Technically extremely vague, and clearly ended up on the abandoned pile. I think the fact that you cannot predict your alignment shift, and that an alignment shift resulting from you being smarter may well be a correct alignment shift in hindsight, plus the trickiness of making an AI resist realignment when we are not sure whether we aligned it correctly in the first place, made it non-feasible for multiple reasons. I remember him arguing it in an informal blog article, and I do not recall much deeper arguments.

The curious case of Pretty Good human inner/outer alignment

PavleMiha3y10

Ah I do personally find that a lot better than wholesale uploading, but even then I'd stop short of complete replacement. I would be too afraid that without noticing I would lose my subjective experience - the people doing the procedure would never know the difference. Additionally, I think for a lot of people if such a procedure would stop them from having kids they wouldn't want to do it. Somewhat akin to having kids with a completely new genetic code, most people seem to not want that. Hard to predict the exact details of these procedures and what public opinion will be of them, but it would only take some people to consistently refuse for their genes to keep propagating.

2MSRayne3y

I feel like "losing subjective experience without noticing" is somehow paradoxical. I don't believe that that's a thing that can conceivably happen. And I really don't understand the kids thing. But I've never cared about having children and the instinct makes no sense to me so maybe you're right.

The curious case of Pretty Good human inner/outer alignment

PavleMiha3y40

I completely agree that our behaviour doesn't maximise the outer goal. My mysteriously capitalised "Pretty Good" was intended to point in this direction - that I find it interesting that we still have some kids, even when we could have none and still have sex and do other fun things. Declining populations would also point to worse alignment. I would consider proper bad alignment to be no kids at all, or the destruction of the planet and our human race along with it, although my phrasing, and thinking on this, is quite vague.

There is an element of unsustain... (read more)

3TurnTrout3y

Yeah, I suspect it's actually pretty hard to get a mesa-optimizer which maximizes some simple, internally represented utility function. I am seriously considering a mechanistic hypothesis where "robust non-maximalness" is the default. That, on its own, does not guarantee safety, but I think it's pretty interesting.

The curious case of Pretty Good human inner/outer alignment

PavleMiha3y10

So like a couple would decide to have kids and they would just pick a set of genes entirely unrelated to theirs to maximise whatever characteristics they valued?

If I understand it correctly, I still feel like most people would choose not to do this, a lot of people seem against even minor genetic engineering, let alone something as major as that. I do understand a lot of the reticence towards genetic engineering has other sources besides “this wouldn’t feel like my child, it’s hard to make any clear predictions.

Yeah, anthropomorphising evolution is pretty ... (read more)

The curious case of Pretty Good human inner/outer alignment

PavleMiha3y10

So if I upload my brain onto silicon, but don’t destroy my meat self in the process, how is the one in the silicon me? Would I feel the qualia of the silicon me? Should I feel better about being killed after I’ve done this process? I really don’t think it’s a matter of the Overton window, people do have an innate desire not to die, and unless I’m missing something this process seems a lot like dying with a copy somewhere.

1MSRayne3y

I'm talking about gradual uploading. Replacing neurons in the brain with computationally identical units of some other computing substrate gradually, one by one, while the patient is awake and is able to describe any changes in consciousness and clearly state if something is wrong so that it can be reversed. Not copying or any other such thing.

The curious case of Pretty Good human inner/outer alignment

PavleMiha3y42

Yes, that's exactly the direction this line of thought is pulling me in! Although perhaps I am less certain we can copy the mechanics of the brain, and more keen on looking at the environments that led to human intelligence developing the way it did, and whether we can do the same with AI.

3Gunnar_Zarncke3y

Agree. The project I'm working on primarily tries to model the attention and reward systems. We don't try to model the brain closely but only structures that are relevant.

The curious case of Pretty Good human inner/outer alignment

PavleMiha3y*0-1

I don't think people have shown any willingness to modify themselves anywhere close to that extent. Most people believe mind uploading would be equal to death (I've only found a survey of philosophers [1]), so I don't see a clear path for us to abandon our biology entirely. Really the clearest path I can see is us being replaced by AI in mostly unpleasant ways, but I wouldn't exactly call that humanity at that point.

I'd even argue that if given the choice to just pick a whole new set of genes for their kids unrelated to theirs most people would say no. A l... (read more)

The curious case of Pretty Good human inner/outer alignment

PavleMiha3y00

I suspect (but can't prove) that most people would not upload themselves to non-biological substrate if given the choice - only 27% of philosophers[1] believe that uploading your brain would mean that you survive on the non-biological substrate. I also suspect that people would not engineer the desire to have kids out of themselves. If most people want to have kids, I don't think we can assume that they would change that desire, a bit like we don't expect very powerful AGIs to allow themselves to be modified. The closest I can think of right now would be t... (read more)

2Charlie Steiner3y

The point about genetic engineering isn't anything to do with not having kids. It's about not propagating your own genome. Kinda like uploading, we would keep "having kids" in the human sense, but not in the sense used by evolution for the last few billion years. It's easy to slip between these by anthropomorphizing evolution (choosing "sensible" goals for it, conforming to human sensibilities), but worth resisting. In the analogy to AI, we wouldn't be satisfied if it reinterpreted everything we tried to teach it about morality in the way we're "reinterpreting evolution" even today.

1MSRayne3y

This is probably more due to uploading being outside the overton window than anything. The existence of large numbers of sci fi enthusiasts and transhumanists who think otherwise implies that this is a matter of culture and perhaps education, not anything innate to humans. I personally want to recycle these atoms and live in a more durable substrate as soon as it is safe to do so. But this is because I am a bucket of memes, not a bucket of genes; memes won the evolution game a long time ago, and from their perspective, my goals are perfectly aligned. Also, I think the gene-centered view is shortsighted. Phenotypes are units of selection as much as genes are; they propagate themselves by means of genes the same way genes propagate themselves by means of phenotypes. It's just that historically genes had much more power over this transaction. Even I do not want to let go of my human shape entirely - though I will after uploading experiment with other shapes as well - so the human phenotype retains plenty of evolutionary fitness into the future.

The curious case of Pretty Good human inner/outer alignment

PavleMiha3y20

I agree with you on what is the inner optimiser. I might not have been able to make myself super clear in the OP, but I see the "outer" alignment as some version "propagate our genes", and I find it curious that that outer goal produced a very robust "want to have kids" inner alignment. I did also try to make the point that the alignment isn't maximal in some way, as in yeah, we don't have 16 kids, and men don't donate to sperm banks as much as possible and other things that might maximise gene propagation, but even that I find interesting: we fulfill evolution's "outer goal" somewhat, without going into paperclip-maximiser-style propagate genes at all cost. This seems to me like something we would want out of an AGI.

The curious case of Pretty Good human inner/outer alignment

PavleMiha3y*30

Genes being concentrated geographically is a fascinating idea, thanks for the book recommendation, I'll definitely have a look.

Niceness does seem like the easiest to explain with our current frameworks, and it makes me think about whether there is scope to train agents in shared environments where they are forced to play iterated games with either other artificial agents or us. Unless an AI can take immediate decisive action, as in a fast take-off scenario, it will, at least for a while, need to play repeated games. This does seem to be covered under the i... (read more)