Anja comments on A definition of wireheading - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (80)
The main split between the human cases and the AI cases is that the humans are 'wireheading' w.r.t. one 'part' or slice through their personality that gets to fulfill its desires at the expense of another 'part' or slice, metaphorically speaking; pleasure taking precedence over other desires. Also, the winning 'part' in each of these cases tends to be a part which values simple subjective pleasure, winning out over parts that have desires over the external world and desires for more complex interactions with that world (in the experience machine you get the complexity but not the external effects).
In the AI case, the AI is performing exactly as it was defined, in an internally unified way; the ideals by which it is called 'wireheaded' are only the intentions and ideals of the human programmers.
I also don't think it's practically possible to specify a powerful AI which actually operates to achieve some programmer goal over the external world, without the AI's utility function being explicitly written over a model of that external world, as opposed to its utility function being written over histories of sensory data.
Illustration: In a universe operating according to Conway's Game of Life or something similar, can you describe how to build an AI that would want to actually maximize the number of gliders, without that AI's world-model being over explicit world-states and its utility function explicitly counting gliders? Using only the parts of the universe that directly impinge on the AI's senses - just the parts of the cellular automaton that impinge on the AI's screen - can you find any maximizable quantity that corresponds to the number of gliders in the outside world? I don't think you'll find any possible way to specify a glider-maximizing utility function over sense histories unless you only use the sense histories to update a world-model and have the utility function be only over that world-model, and even then the extra level of indirection might open up a possibility of 'wireheading' (of the AI's real operation vs. programmer-desired glider-maximizing operation) if any number of plausible minor errors were made.
The word "value" seems unnecessarily value-laden here.
Alternatively: A consequentialist agent is an algorithm with causal connections both to and from the world, which uses the causal effect of the world upon itself (sensory data) to build a predictive model of the world, which it uses to model the causal outcomes of alternative internal states upon the world (the effect of its decisions and actions), evaluates these predicted consequences using some algorithm and assigns the prediction an ordered or continuous quantity (in the standard case, expected utility), and then decides an action corresponding to expected consequences which are thresholded above, relatively high, or maximal in this assigned quantity.
Simpler: A consequentialist agent predicts the effects of alternative actions upon the world, assigns quantities over those consequences, and chooses an action whose predicted effects have high value of this quantity, therefore operating to steer the external world into states corresponding to higher values of this quantity.
Changed it to "number".