JGWeissman comments on Late Great Filter Is Not Bad News - Less Wrong

14 Post author: Wei_Dai 04 April 2010 04:17AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (75)

You are viewing a single comment's thread. Show more comments above.

Comment author: JGWeissman 05 April 2010 12:11:32AM 1 point [-]

Such "value-seeking" behavior doesn't seem irrational to me, even though I don't know how to account for it in terms of rationality.

I would say it is part of checking for reflective consistency. Ideally, there shouldn't be arguments that change your (terminal) values, so if there are, you want to so you can figure out what is wrong and how to fix it.

Comment author: Wei_Dai 05 April 2010 12:31:00AM 2 points [-]

I don't think that explanation makes sense. Suppose an AI thinks it might have a security hole in its network stack, so that if someone sends it a certain packet, it would become that person's slave. It would try to fix that security hole, without actually seeking to have such a packet sent to itself.

We humans know that there are arguments out there that can change our values, but instead of hardening our minds against them, some of us actually try to have such arguments sent to us.

Comment author: Amanojack 05 April 2010 11:47:21PM 1 point [-]

We humans know that there are arguments out there that can change our values, but instead of hardening our minds against them, some of us actually try to have such arguments sent to us.

In the deontological view of values this is puzzling, but in the consequentialist view it isn't: we welcome arguments that can change our instrumental values, but not our terminal values (A.K.A. happiness/pleasure/eudaimonia/etc.). In fact I contend that it doesn't even make sense to talk about changing our terminal values.

Comment deleted 05 April 2010 11:26:49PM [-]
Comment author: pjeby 06 April 2010 01:38:36AM *  1 point [-]

My explanation is that the human mind is something like a coalition of different sub-agents, many of which are more like animals or insects than rational agents. In any given context, they will pull the overall strategy in different directions. The overall result is an agent with context dependent preferences, i.e. irrational behavior.

Congratulations, you just reinvented [a portion of] PCT. ;-)

[Clarification: PCT models the mind as a massive array of simple control circuits that act to correct errors in isolated perceptions, with consciousness acting as a conflict-resolver to manage things when two controllers send conflicting commands to the same sub-controller. At a fairly high level, a controller might be responsible for a complex value: like correcting hits to self-esteem, or compensating for failings in one's aesthetic appreciation of one's work. Such high-level controllers would thus appear somewhat anthropomorphically agent-like, despite simply being something that detects a discrepancy between a target and an actual value, and sets subgoals in an attempt to rectify the detected discrepancy. Anything that we consider of value potentially has an independent "agent" (simple controller) responsible for it in this way, but the hierarchy of control does not necessarily correspond to how we would abstractly prefer to rank our values -- which is where the potential for irrationaity and other failings lies.]