Comment author: Anja 19 December 2012 05:41:48PM *  5 points [-]

I like how you specify utility directly over programs, it describes very neatly how someone who sat down and wrote a utility function

would do it: First determine how the observation could have been computed by the environment and then evaluate that situation. This is a special case of the framework I wrote down in the cited article; you can always set

This solves wireheading only if we can specify which environments contain wireheaded (non-dualistic) agents, delusion boxes, etc..

Comment author: brazil84 29 November 2012 09:09:39PM 2 points [-]

Further to my last comment, it occurs to me that pretty much everyone is a wirehead already. Drink diet soda? You're a wirehead. Have sexual relations with birth control? Wireheading. Masturbate to internet porn? Wireheading. Ever eat junk food? Wireheading.

I was reading online that for a mere $10,000, a man can hire a woman in India to be a surrogate mother for him. Just send $10,000 and a sperm sample and in 9 months you can go pick up your child. Why am I not spending all my money to make third world children who bear my genes? I guess it's because I'm too much of a wirehead already.

Comment author: Anja 29 November 2012 11:36:37PM 0 points [-]

You are a wirehead if you consider your true utility function to be genetic fitness.

Comment author: davidpearce 28 November 2012 10:50:26PM 8 points [-]

A very nice post. Perhaps you might also discuss Felipe De Brigard's "Inverted Experience Machine Argument" http://www.unc.edu/~brigard/Xmach.pdf To what extent does our response to Nozick's Experience Machine Argument typically reflect status quo bias rather than a desire to connect with ultimate reality?

If we really do want to "stay in touch" with reality, then we can't wirehead or plug into an "Experience Machine". But this constraint does not rule out radical superhappiness. By genetically recalibrating the hedonic treadmill, we could in principle enjoy rich, intelligent, complex lives based on information-sensitive gradients of bliss - eventually, perhaps, intelligent bliss orders of magnitude richer than anything physiologically accessible today. Optionally, genetic recalibration of our hedonic set-points could in principle leave much if not all of our existing preference architecture intact - defanging Nozick's Experience Machine Argument - while immensely enriching our quality of life. Radical hedonic recalibration is also easier than, say, the idealised logical reconciliation of Coherent Extrapolated Volition because hedonic recalibration doesn't entail choosing between mutually inconsistent values - unless of course one's values are bound up with the inflicting or undergoing suffering.

IMO one big complication with discussions of "wireheading" is that our understanding of intracranial self-stimulation has changed since Olds and Milner discovered the "pleasure centres". Taking a mu opioid agonist like heroin is in some ways the opposite of wireheading because heroin induces pure bliss without desire (shades of Buddhist nirvana?), whereas intracranial self-stimulation of the mesolimbic dopamine system involves a frenzy of anticipation rather than pure happiness. So it's often convenient to think of mu opioid agonists as mediating "liking" and dopamine agonists as mediating "wanting". We have two ultimate cubic centimetre sized "hedonic hotspots" in the rostral shell of the nucleus accumbens and ventral pallidum http://www.lsa.umich.edu/psych/research%26labs/berridge/publications/Berridge%202003%20Brain%20%26%20Cog%20Pleasures%20of%20brain.pdf where mu opioid agonists play a critical signalling role. But anatomical location is critical. Thus the mu opioid agonist remifentanil actually induces dysphoria http://www.ncbi.nlm.nih.gov/pubmed/18801832 - the opposite of what one might naively suppose.

Comment author: Anja 29 November 2012 03:41:42AM 4 points [-]

To what extent does our response to Nozick's Experience Machine Argument typically reflect status quo bias rather than a desire to connect with ultimate reality?

I think the argument that people don't really want to stay in touch with reality but rather want to stay in touch with their past makes a lot of sense. After all we construct our model of reality from our past experiences. One could argue that this is another example of a substitute measure, used to save computational resources: Instead of caring about reality we care about our memories making sense and being meaningful.

On the other hand I assume I wasn't the only one mentally applauding Neo for swallowing the red pill.

Comment author: Alexei 28 November 2012 08:56:33AM 6 points [-]

Anja, this is a fantastic post. It's very clear, easy to read, and it made a lot of sense to me (and I have very little background in thinking about this sort of stuff). Thanks for writing it up! I can understand several issues a lot more clearly now, especially how easy (and tempting) it is for an agent that has access to its source code to wirehead itself.

Comment author: Anja 29 November 2012 03:27:56AM 1 point [-]

Thank you.

Comment author: devas 28 November 2012 11:32:47AM 2 points [-]

I agree with Alexei, this has just now helped me a lot.

Although I now have to ask a stupid question; please have pity on me, I'm new to the site and I have little knowledge to work of.

What would happen if we set an algorithm inside the AGI assigning negative infinite utility to any action which modifies its own utility function and said algorithm itself?

This within reasonable parameters; ideally, it could change its utility function but only in certain pre approved paths, so that it could actually move around.

Reasonable here is a magic word, in the sense that it's a block box which I don't know how to map out

Comment author: Anja 29 November 2012 03:27:17AM 4 points [-]

What would happen if we set an algorithm inside the AGI assigning negative infinite utility to any action which modifies its own utility function and said algorithm itself?

There are several problems with this approach: First of all how do you specify all actions that modify the utility function? How likely do you think it is that you can exhaustively specify all sequences of actions that lead to modification of the utility function in a practical implementation? Experience with cryptography has taught us, that there is almost always some side channel attack that the original developers have not thought of, and that is just in the case of human vs. human intelligence.

Forbidden actions in general seem like a bad idea with an AGI that is smarter than us, see for example the AI Box experiment.

Then there is the problem that we actually don't want any part of the AGI to be unmodifiable. The agent might revise its model of how the universe works (like we did when we went from Newtonian physics to quantum mechanics) and then it has to modify its utility function or it is left with gibberish.

All that said, I think what you described corresponds to the hack evolution has used on us: We have acquired a list of things (or schemas) that will mess up our utility functions and reduce agency and those just feel icky to us, like the experience machine or electrical stimulation of the brain. But we don't have the luxury of learning by making lots and lots of mistakes that evolution had.

Comment author: timtyler 28 November 2012 11:47:52PM *  5 points [-]

My 2011 "Utility counterfeiting" essay categorises the area a little differently:

It has "utility counterfeiting" as the umbrella category - and "the wirehead problem" and "the pornography problem" as sub-categories.

In this categorisation scheme, the wirehead problem involves getting utility directly - while the ponography problem involves getting utility by manipulating sensory inputs. This corresponds to Nozick's experience machine, or Ring and Orseau's delusion box.

Calling the umbrella category "wireheading" leaves you with the problem of what to call these subcategories.

Comment author: Anja 29 November 2012 02:54:11AM 3 points [-]

You might be right. I thought about this too, but it seemed people on LW had already categorized the experience machine as wireheading. If we rebrand, we should maybe say "self-delusion" instead of "pornography problem"; I really like the term "utility counterfeiting" though and the example about counterfeit money in your essay.

Comment author: Eliezer_Yudkowsky 28 November 2012 01:20:43AM 12 points [-]

The main split between the human cases and the AI cases is that the humans are 'wireheading' w.r.t. one 'part' or slice through their personality that gets to fulfill its desires at the expense of another 'part' or slice, metaphorically speaking; pleasure taking precedence over other desires. Also, the winning 'part' in each of these cases tends to be a part which values simple subjective pleasure, winning out over parts that have desires over the external world and desires for more complex interactions with that world (in the experience machine you get the complexity but not the external effects).

In the AI case, the AI is performing exactly as it was defined, in an internally unified way; the ideals by which it is called 'wireheaded' are only the intentions and ideals of the human programmers.

I also don't think it's practically possible to specify a powerful AI which actually operates to achieve some programmer goal over the external world, without the AI's utility function being explicitly written over a model of that external world, as opposed to its utility function being written over histories of sensory data.

Illustration: In a universe operating according to Conway's Game of Life or something similar, can you describe how to build an AI that would want to actually maximize the number of gliders, without that AI's world-model being over explicit world-states and its utility function explicitly counting gliders? Using only the parts of the universe that directly impinge on the AI's senses - just the parts of the cellular automaton that impinge on the AI's screen - can you find any maximizable quantity that corresponds to the number of gliders in the outside world? I don't think you'll find any possible way to specify a glider-maximizing utility function over sense histories unless you only use the sense histories to update a world-model and have the utility function be only over that world-model, and even then the extra level of indirection might open up a possibility of 'wireheading' (of the AI's real operation vs. programmer-desired glider-maximizing operation) if any number of plausible minor errors were made.

Definition: An agent is an algorithm that models the effects of (several different) possible future actions on the world and performs the action that yields the highest value according to some evaluation procedure.

The word "value" seems unnecessarily value-laden here.

Alternatively: A consequentialist agent is an algorithm with causal connections both to and from the world, which uses the causal effect of the world upon itself (sensory data) to build a predictive model of the world, which it uses to model the causal outcomes of alternative internal states upon the world (the effect of its decisions and actions), evaluates these predicted consequences using some algorithm and assigns the prediction an ordered or continuous quantity (in the standard case, expected utility), and then decides an action corresponding to expected consequences which are thresholded above, relatively high, or maximal in this assigned quantity.

Simpler: A consequentialist agent predicts the effects of alternative actions upon the world, assigns quantities over those consequences, and chooses an action whose predicted effects have high value of this quantity, therefore operating to steer the external world into states corresponding to higher values of this quantity.

Comment author: Anja 28 November 2012 01:51:43AM 8 points [-]

The word "value" seems unnecessarily value-laden here.

Changed it to "number".

Comment author: dspeyer 27 November 2012 10:42:32PM 1 point [-]

Can this be generalized to more kinds of minds? I suspect that many humans don't exactly have utility functions or plans for maximizing them, but are still capable of wireheading or choosing not to wirehead.

Comment author: Anja 27 November 2012 11:37:17PM *  5 points [-]

You are correct in pointing out that for human agents the evaluation procedure is not a deliberate calculation of expected utility, but some messy computation we have little access to. In many instances this can however be reasonably well translated into the framework of (partial) utility functions, especially if our preferences approximately satisfy transitivity, continuity and independence.

For noticing discrepancies between true and substitute utility it is not necessary to exactly know both functions, it suffices to have an icky feeling that tells you that you are acting in a way that is detrimental to your (true) goals.

If all else fails we can time-index world states and equip the agent with a utility function by pretending that he assigned utility of 1 to the world state he actually brought about and 0 to the others. ;)

Comment author: timtyler 18 November 2012 03:33:10AM 1 point [-]

AIXI actually has a configurable horizon function. It's described on page 30 of AIXIgentle.

Comment author: Anja 19 November 2012 04:44:44AM 3 points [-]

There is also a more detailed paper by Lattimore and Hutter (2011) on discounting and time consistency that is interesting in that context.

Comment author: AlexMennen 18 November 2012 08:11:54PM *  2 points [-]

No, you don't. If you tried to represent Agent 2 in that notation, you would get

modeled_action(n, k) = argmax(y_k) sum(x_k) [u_k(yx_<k, yx_k) + u_(k+1)(yx_<k, yx_k:k+1) + ... + u_n(yx_<k, yx_k:n)]*M(yx_<k, yx_k:n), where y_m = modeled_action(n, m) for m>k.

You were using u_k to represent the utility of the last step of its input, so that total utility is the sum of the utilities of its prefixes, while I was using u_k to represent the utility of the whole sequence. If I adapt Agent 4 to your use of u_k, I get

modeled_action(n, k) = argmax(y_k) sum(x_k) [u_k(yx_<k, yx_k) + u_k(yx_<k, yx_k:k+1) + ... + u_k(yx_<k, yx_k:n)]*M(yx_<k, yx_k:n), where y_m = modeled_action(n, m) for m>k.

Comment author: Anja 19 November 2012 04:26:07AM *  3 points [-]

I am starting to see what you mean. Let's stick with utility functions over histories of length m_k (whole sequences) like you proposed and denote them with a capital U to distinguish them from the prefix utilities. I think your Agent 4 runs into the following problem: modeled_action(n,m) actually depends on the actions and observations yx_{k:m-1} and needs to be calculated for each combination, so y_m is actually

which clutters up the notation so much that I don't want to write it down anymore.

We also get into trouble with taking the expectation, the observations x_{k+1:n} are only considered in modeling the actions of the future agents, but not now. What is M(yx_<k,yx_k:n) even supposed to mean, where do the x's come from?

So let's torture some indices:

where n>=k and

This is not really AIXI anymore and I am not sure what to do with it, but I like it.

View more: Prev | Next