Slider comments on Value learning: ultra-sophisticated Cake or Death - Less Wrong

9 Post author: Stuart_Armstrong 17 June 2014 04:36PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (15)

You are viewing a single comment's thread.

Comment author: Slider 19 June 2014 11:14:59AM *  0 points [-]

I have really trouble with this step

Thus hearing "the true morality is the hard task" has significantly reduced the AI's expected utility. It would really have preferred not to have heard this - it would much prefer to have manipulated or murdered the speaker, or simply not talked with them.

I understand that bad news makes one sad but does that lead to rejecting bad news? Similarly pain is a good thing. Without it you would end up in all sorts of trouble. I would think that having a accurate knowledge of a things utility would be more important than knowings it's expectancy. If you have a solid 0.5 utility or a 50/50 possibility of 1 and 0 you know in the uncertain case that if you behave as if the utility was 0.5 you are wrong by 0.5 in any case.

Comment author: Stuart_Armstrong 19 June 2014 02:18:02PM 0 points [-]

I understand that bad news makes one sad but does that lead to rejecting bad news?

For standard Bayesian agents, no. But these value updating agents behave differently. Imagine if a human said to the AI "If I say good, you action was good, and that will be your values. If I say bad, it will be the reverse." Wouldn't you want to motivate it to say "good"?

Comment author: Slider 19 June 2014 07:55:43PM 0 points [-]

I have trouble seeing the difference as I think you can turn the variable value statements into empirical facts that map to a constant value. Say that cake->yummy->good, cake->icky->bad, death->icky->bad, death->yummy->good. Then the yummy->good connection could be questioned as a matter about the world and not about values. If a bayesian accepts sad news in that kind of world how come the value loader tries to shun them?

Comment author: Stuart_Armstrong 20 June 2014 10:33:30AM 0 points [-]
Comment author: [deleted] 19 June 2014 07:33:28PM 0 points [-]

Wouldn't you want to motivate it to say "good"?

I might be committing mind-projection here, but no. Data is data, evidence is evidence. Expected moral data is, in some sense, moral data: if the AI predicts with high confidence that I will say "bad", this ought to already be evidence that it ought not have done whatever I'm about to scold it for.

Comment author: Stuart_Armstrong 20 June 2014 10:33:17AM 0 points [-]