Squark comments on Reply to Holden on 'Tool AI' - Less Wrong

94 Post author: Eliezer_Yudkowsky 12 June 2012 06:00PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (348)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 11 July 2012 10:59:07PM 29 points [-]

Didn't see this at the time, sorry.

So... I'm sorry if this reply seems a little unhelpful, and I wish there was some way to engage more strongly, but...

Point (1) is the main problem. AIXI updates freely over a gigantic range of sensory predictors with no specified ontology - it's a sum over a huge set of programs, and we, the users, have no idea what the representations are talking about, except that at the end of their computations they predict, "You will see a sensory 1 (or a sensory 0)." (In my preferred formalism, the program puts a probability on a 0 instead.) Inside, the program could've been modeling the universe in terms of atoms, quarks, quantum fields, cellular automata, giant moving paperclips, slave agents scurrying around... we, the programmers, have no idea how AIXI is modeling the world and producing its predictions, and indeed, the final prediction could be a sum over many different representations.

This means that equation (20) in Hutter is written as a utility function over sense data, where the reward channel is just a special case of sense data. We can easily adapt this equation to talk about any function computed directly over sense data - we can get AIXI to optimize any aspect of its sense data that we please. We can't get it to optimize a quality of the external universe. One of the challenges I listed in my FAI Open Problems talk, and one of the problems I intend to talk about in my FAI Open Problems sequence, is to take the first nontrivial steps toward adapting this formalism - to e.g. take an equivalent of AIXI in a really simple universe, with a really simple goal, something along the lines of a Life universe and a goal of making gliders, and specify something given unlimited computing power which would behave like it had that goal, without pre-fixing the ontology of the causal representation to that of the real universe, i.e., you want something that can range freely over ontologies in its predictive algorithms, but which still behaves like it's maximizing an outside thing like gliders instead of a sensory channel like the reward channel. This is an unsolved problem!

We haven't even got to the part where it's difficult to say in formal terms how to interpret what a human says s/he wants the AI to plan, and where failures of phrasing of that utility function can also cause a superhuman intelligence to kill you. We haven't even got to the huge buried FAI problem inside the word "optimal" in point (1), which is the really difficult part in the whole thing. Because so far we're dealing with a formalism that can't even represent a purpose of the type you're looking for - it can only optimize over sense data, and this is not a coincidental fact, but rather a deep problem which the AIXI formalism deliberately avoided.

(2) sounds like you think an AI with an alien, superhuman planning algorithm can tell humans what to do without ever thinking consequentialistically about which different statements will result in human understanding or misunderstanding. Anna says that I need to work harder on not assuming other people are thinking silly things, but even so, when I look at this, it's hard not to imagine that you're modeling AIXI as a sort of spirit containing thoughts, whose thoughts could be exposed to the outside with a simple exposure-function. It's not unthinkable that a non-self-modifying superhuman planning Oracle could be developed with the further constraint that its thoughts are human-interpretable, or can be translated for human use without any algorithms that reason internally about what humans understand, but this would at the least be hard. And with AIXI it would be impossible, because AIXI's model of the world ranges over literally all possible ontologies and representations, and its plans are naked motor outputs.

Similar remarks apply to interpreting and answering "What will be its effect on _?" It turns out that getting an AI to understand human language is a very hard problem, and it may very well be that even though talking doesn't feel like having a utility function, our brains are using consequential reasoning to do it. Certainly, when I write language, that feels like I'm being deliberate. It's also worth noting that "What is the effect on X?" really means "What are the effects I care about on X?" and that there's a large understanding-the-human's-utility-function problem here. In particular, you don't want your language for describing "effects" to partition, as the same state of described affairs, any two states which humans assign widely different utilities. Let's say there are two plans for getting my grandmother out of a burning house, one of which destroys her music collection, one of which leaves it intact. Does the AI know that music is valuable? If not, will it not describe music-destruction as an "effect" of a plan which offers to free up large amounts of computer storage by, as it turns out, overwriting everyone's music collection? If you then say that the AI should describe changes to files in general, well, should it also talk about changes to its own internal files? Every action comes with a huge number of consequences - if we hear about all of them (reality described on a level so granular that it automatically captures all utility shifts, as well as a huge number of other unimportant things) then we'll be there forever.

I wish I had something more cooperative to say in reply - it feels like I'm committing some variant of logical rudeness by this reply - but the truth is, it seems to me that AIXI isn't a good basis for the agent you want to describe; and I don't know how to describe it formally myself, either.

Comment author: Squark 08 February 2013 06:51:51PM 0 points [-]

Regarding the question of formalizing an optimization agent with goals defined in terms of external universe rather than sensory input. It is possible to attack the problem by generalizing the framework I described in http://lesswrong.com/lw/gex/save_the_princess_a_tale_of_aixi_and_utility/8ekk for solving the duality problem. Specifically, consider an "initial guess" stochastic model of the universe including the machine on which our agent is running. I call it the "innate model" M. Now consider a stochastic process with the same degrees of freedom as M but governed by the Solomonoff semi-measure. This is the "unbiased model" S. The two can be combined by assigning transition probabilities proportional to the product of the probabilities assigned by M and S. If M is sufficiently "insecure" (in particular it doesn't assign 0 to any transition probability) then the resulting model S', considered as prior, allows arriving at any computable model after sufficient learning. Fix a utility function on the space of histories of our model (note that the histories include both intrinsic and extrinsic degrees of freedom). The intelligence I(A) of any given agent A (= program written in M in the initial state) can now be defined to be the expected utility of A in S'. We can now consider optimal or near-optimal agents in this sense (as opposed to the Legg-Hutter formalism for measuring intelligence, there is no guarantee there is a maximum rather than a supremum; unless of course we limit the length of the programs we consider). This is a generalization of the Legg-Hutter formalism which accounts for limited computational resources, solves the duality problem (such agents take into account possibly wireheading) and also provides a solution for the ontology problem. This is essentially a special case of the Orseau-Ring framework. It is however much more specific than Orseau-Ring where the prior is left completely unspecified. You can think of it as a recipe for constructing Orseau-Ring priors from realistic problems

Comment author: Squark 09 February 2013 01:28:52PM *  0 points [-]

I realized that although the idea of a deformed Solomonoff semi-measure is correct, the multiplication prescription I suggested is rather ad hoc. The following construction is a much more natural and justifiable way of combining M and S.

Fix t0 a time parameter. Consider a stochastic process S(-t0) that begins at time t = -t0, where t = 0 is the time our agent A "forms", governed by the Solomonoff semi-measure. Consider another stochastic process M(-t0) that begins from the initial conditions generated by S(-t0) (I'm assuming M only carries information about dynamics and not about initial conditions). Define S' to be the conditional probability distribution obtained from S by two conditions:

a. S and M coincide on the time interval [-t0, 0]

b. The universe contains A at time t=0

Thus t0 reflects the extent to which we are certain about M: it's like telling the agent we have been observing behavior M for time period t0.

There is an interesting side effect to this framework, namely that A can exert "acausal" influence on the utility by affecting the initial conditions of the universe (i.e. it selects universes in which A is likely to exist). This might seem like an artifact of the model but I think it might be a legitimate effect: if we believe in one-boxing in Newcomb's paradox, why shouldn't we accept such acausal effects?

For models with a concept of space and finite information velocity, like cellular automata, it might make sense to limit the domain of "observed M" in space as well as time, to A's past "light-cone"

Comment author: Eliezer_Yudkowsky 08 February 2013 06:59:48PM 0 points [-]

I cannot even slightly visualize what you mean by this. Please explain how it would be used to construct an AI that made glider-oids in a Life-like cellular automaton universe.

Comment author: Squark 08 February 2013 08:22:19PM 0 points [-]

Is the AI hardware separate from the cellular automaton or is it a part of it? Assuming the latter, we need to decide which degrees of freedom of the cellular automaton form the program of our AI. For example we can select a finite set of cells and allow setting their values arbitrarily. Then we need to specify our utility function. For example it can be a weighted sum of the number of gliders at different moments of time, or a maximum or whatever. However we need to make sure the expectation values converge. Then the "AI" is simply the assignment of values to the selected cells in the initial state which yields the maximal expect utility. Note though that if we're sure about the law governing the cellular automaton then there's no reason to use the Solomonoff semi-measure at all (except maybe as a prior for the initial state outside the selected cells). However if our idea of the way the cellular automaton works is only an "initial guess" then the expectation value is evaluated w.r.t. a stochastic process governed by a "deformed Solomonoff" semi-measure in which transitions illegal w.r.t. assumed cellular automaton law are suppressed by some factor 0 < p < 1 w.r.t. "pure" Solomonoff inference. Note that, contrary to the case of AIXI, I can only describe the measure of intelligence, I cannot constructively describe the agent maximizing this measure. This is unsurprising since building a real (bounded computing resources) AI is a very difficult problem