Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

pjeby comments on The curse of identity - Less Wrong

125 Post author: Kaj_Sotala 17 November 2011 07:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (298)

You are viewing a single comment's thread. Show more comments above.

Comment author: pjeby 17 November 2011 03:53:48PM 9 points [-]

The best I can do is to assume that there are two agents A and B, how want X and Y, respectively, and A is really good at getting X, but B unfortunately models itself as being A, but is also incompetent enough to think A wants Y, so that B still believes it wants Y. B has little power and is exploited by A, so B rarely makes progress towards Y and thus has a problem and complains. But that doesn't sound too realistic

That actually sounds like a pretty good description of the problem, and of "normal" human behavior in situations where X and Y aren't aligned. (Which, by the way, is not a human universal, and there are good reasons to assume that it's not the only kind of situation for which evolution has prepared us).

The part that's missing from your description is that part A, while very persistent, lacks any ability to really think things through in the way that B can, and makes its projections and choices based on a very "dumb" sort of database.... a database that B has read/write access to.

The premise of mindhacking, at least in the forms I teach, is that you can change A's behavior and goals by tampering with its database, provided that you can find the relevant entries in that database. The actual tampering part is pretty ridiculously easy, as memories are notoriously malleable and distortable just by asking questions about them. Finding the right memories to mess with is the hard part, since A's actual decision-making process is somewhat opaque to B, and most of A's goal hierarchy is completely invisible to B, and must be inferred by probing the database with hypothetical-situation queries.

One of the ways that A exploits B is that B perceives itself as having various overt, concrete goals... that are actually comparatively low-level subgoals of A's true goals. And as I said, those goals are not available to direct introspection; you have to use hypothetical-situation queries to smoke out what A's true goals are.

Actually, it's somewhat of a misnomer to say that A exploits B, or even to see A as an entity at all. To me, A is just machinery, automated equipment. While it has a certain amount of goal consistency protection (i.e., desire to maintain goals across self-modification), it is not very recursive and is easily defeated once you identify the Nth-order constraint on a particular goal, for what's usually a very low value of N.

So, it's more useful (I find) to think of A as a really powerful and convenient automaton that can learn and manage plenty of things on its own, but which sometimes gets things wrong and needs B's help to troubleshoot the problems.

That's because part A isn't smart enough to resolve inter-temporal conflicts on its own; absent injunctive relief or other cached thoughts to overcome discounting, it'll stay stuck in a loop of preference reversals pretty much forever.