MarsColony_in10years comments on Morality as Fixed Computation - Less Wrong

14 Post author: Eliezer_Yudkowsky 08 August 2008 01:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (45)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: MarsColony_in10years 18 August 2015 05:36:05PM *  0 points [-]

That's a good point. I think the distinction is that these people are modifying their own instrumental values, but leaving their terminal values (the big meaning of life blob of computation) unchanged. I'd go so far as to say that people frequently do this trick by mistake, when they convince themselves that they have various terminal values. This certainly explains things like happy death spirals.

On the other hand, this would be very difficult (impossible?) to test.

EDIT: I've given this a bit more thought, and I wonder what it would feel like from the inside to be a machine learning algorithm that could make limited small self-modifications to it's own utility function, including it's optimization criteria. This seems like a "simple" enough hack that evolution could have generated it. This also seems to mirror real human psychology surprisingly well.

I'm imagining trying to answer the question "what I would like to change my utility function to", while simultaneously not fully understanding the dangers of messing around like that. It seems like this could easily generate people like religious extremists, even if earlier versions of those people would never have deliberately tried to become that twisted. If the other side seems completely wrong and evil, then I can picture disliking parts of myself that resemble the other side, as well as well as any empathy I may have for them. I can imagine how suppressing those parts of myself would lead to extremism.

I wonder what the official Yudkowski position on this is. More importantly, I wonder what happens if you get this question wrong while trying to build a Friendly AI. It seems like there might be issues if you assume a static Coherent Extrapolated Volition if it is actually dynamically changing, or vice versa.