JGWeissman comments on Late Great Filter Is Not Bad News - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (75)
I would say it is part of checking for reflective consistency. Ideally, there shouldn't be arguments that change your (terminal) values, so if there are, you want to so you can figure out what is wrong and how to fix it.
I don't think that explanation makes sense. Suppose an AI thinks it might have a security hole in its network stack, so that if someone sends it a certain packet, it would become that person's slave. It would try to fix that security hole, without actually seeking to have such a packet sent to itself.
We humans know that there are arguments out there that can change our values, but instead of hardening our minds against them, some of us actually try to have such arguments sent to us.
In the deontological view of values this is puzzling, but in the consequentialist view it isn't: we welcome arguments that can change our instrumental values, but not our terminal values (A.K.A. happiness/pleasure/eudaimonia/etc.). In fact I contend that it doesn't even make sense to talk about changing our terminal values.
Congratulations, you just reinvented [a portion of] PCT. ;-)
[Clarification: PCT models the mind as a massive array of simple control circuits that act to correct errors in isolated perceptions, with consciousness acting as a conflict-resolver to manage things when two controllers send conflicting commands to the same sub-controller. At a fairly high level, a controller might be responsible for a complex value: like correcting hits to self-esteem, or compensating for failings in one's aesthetic appreciation of one's work. Such high-level controllers would thus appear somewhat anthropomorphically agent-like, despite simply being something that detects a discrepancy between a target and an actual value, and sets subgoals in an attempt to rectify the detected discrepancy. Anything that we consider of value potentially has an independent "agent" (simple controller) responsible for it in this way, but the hierarchy of control does not necessarily correspond to how we would abstractly prefer to rank our values -- which is where the potential for irrationaity and other failings lies.]