User Comment Replies

I think what the OP is saying is that each luigi step is actually a superposition step, and therefore each next line adds up the probability of collapse. However, from a pure trope perspective I believe this is not really the case - in most works of fiction that have a twist, the author tends to leave at least some subtle clues for the twist (luigi turning out to be a waluigi). So it is possible at least for some lines to decrease the possibility of waluigi collapse.

The Preference Fulfillment Hypothesis

kibber2y32

To clarify, I meant the AI is unlikely to have it by default (being able to perfectly simulate a person does not in itself require having empathy as part of the reward function).

If we try to hardcode it, Goodhart's curse seems relevant: https://arbital.com/p/goodharts_curse/

0Gunnar_Zarncke2y

But note that Reward is not the optimization target

The Preference Fulfillment Hypothesis

kibber2y*42

An AI that can be aligned to preferences of even just one person is already an aligned AI, and we have no idea how to do that.

An AI that's able to ~perfectly simulate what a person would feel would not necessarily want to perform actions that would make the person feel good. Humans are somewhat likely to do that because we have actual (not simulated) empathy, that makes us feel bad when someone close feels bad, and the AI is unlikely to have that. We even have humans that act like that (i.e. sociopaths), and they are still humans, not AIs!

3Kaj_Sotala2y

Is there some particular reason to assume that it'd be hard to implement?

Failed Utopia #4-2

kibber13y30

...from Venus, and only animals left on Earth, so one more planet than we had before.

3rkyeun13y

Well, until we get back there. It's still ours even if we're on vacation.

LESSWRONG
LW

All of kibber's Comments + Replies