RobbBB comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: TheOtherDave 06 September 2013 06:46:39PM 0 points [-]

You keep coming back to this 'logically incoherent goals' and 'vague goals' idea. Honestly, I don't have the slightest idea what you mean by those things.

Well, I'm not sure what XXD means by them, but...

G1 ("Everything is painted red") seems like a perfectly coherent goal. A system optimizing G1 paints things red, hires people to paint things red, makes money to hire people to paint things red, invents superior paint-distribution technologies to deposit a layer of red paint over things, etc.

G2 ("Everything is painted blue") similarly seems like a coherent goal.

G3 (G1 AND G2) seems like an incoherent goal. A system with that goal... well, I'm not really sure what it does.

Comment author: RobbBB 06 September 2013 07:35:27PM *  1 point [-]

A system's goals have to be some event that can be brought about. In our world, '2+2=4' and '2+2=5' are not goals; 'everything is painted red and not-red' may not be a goal for similar reasons. When we're talking about an artificial intelligence's preferences, we're talking about the things it tends to optimize for, not the things it 'has in mind' or the things it believes are its preferences.

This is part of what makes the terminology misleading, and is also why we don't ask 'can a superintelligence be irrational?'. Irrationality is dissonance between my experienced-'goals' (and/or, perhaps, reflective-second-order-'goals') and my what-events-I-produce-'goals'; but we don't care about the superintelligence's phenomenology. We only care about what events it tends to produce.

Tabooing 'goal' and just talking about the events a process-that-models-its-environment-and-directs-the-future tends to produce would, I think, undermine a lot of XiXiDu's intuitions about goals being complex explicit objects you have to painstakingly code in. The only thing that makes it more useful to model a superintelligence as having 'goals' than modeling a blue-minimizing robot as having 'goals' is that the superintelligence responds to environmental variation in a vastly more complicated way. (Because, in order to be even a mediocre programmer, its model-of-the-world-that-determines-action has to be more complicated than a simple camcorder feed.)

Comment author: TheOtherDave 07 September 2013 04:35:51PM 1 point [-]

we're talking about the things it tends to optimize for, not the things it 'has in mind'

Oh.
Well, in that case, all right. If there exists some X a system S is in fact optimizing for, and what we mean by "S's goals" is X, regardless of what target S "has in mind", then sure, I agree that systems never have vague or logically incoherent goals.

just talking about the events a process-that-models-its-environment-and-directs-the-future tends to produce

Well, wait. Where did "models its environment" come from?
If we're talking about the things S optimizes its environment for, not the things S "has in mind", then it would seem that whether S models its environment or not is entirely irrelevant to the conversation.

In fact, given how you've defined "goal" here, I'm not sure why we're talking about intelligence at all. If that is what we mean by "goal" then intelligence has nothing to do with goals, or optimizing for goals. Volcanoes have goals, in that sense. Protons have goals.

I suspect I'm still misunderstanding you.

Comment author: RobbBB 07 September 2013 06:18:57PM *  1 point [-]

From Eliezer's Belief in Intelligence:

"Since I am so uncertain of Kasparov's moves, what is the empirical content of my belief that 'Kasparov is a highly intelligent chess player'? What real-world experience does my belief tell me to anticipate? [...]

"The empirical content of my belief is the testable, falsifiable prediction that the final chess position will occupy the class of chess positions that are wins for Kasparov, rather than drawn games or wins for Mr. G. [...] The degree to which I think Kasparov is a 'better player' is reflected in the amount of probability mass I concentrate into the 'Kasparov wins' class of outcomes, versus the 'drawn game' and 'Mr. G wins' class of outcomes."

From Measuring Optimization Power:

"When I think you're a powerful intelligence, and I think I know something about your preferences, then I'll predict that you'll steer reality into regions that are higher in your preference ordering. [...]

"Ah, but how do you know a mind's preference ordering? Suppose you flip a coin 30 times and it comes up with some random-looking string - how do you know this wasn't because a mind wanted it to produce that string?

"This, in turn, is reminiscent of the Minimum Message Length formulation of Occam's Razor: if you send me a message telling me what a mind wants and how powerful it is, then this should enable you to compress your description of future events and observations, so that the total message is shorter. Otherwise there is no predictive benefit to viewing a system as an optimization process. This criterion tells us when to take the intentional stance.

"(3) Actually, you need to fit another criterion to take the intentional stance - there can't be a better description that averts the need to talk about optimization. This is an epistemic criterion more than a physical one - a sufficiently powerful mind might have no need to take the intentional stance toward a human, because it could just model the regularity of our brains like moving parts in a machine.

"(4) If you have a coin that always comes up heads, there's no need to say "The coin always wants to come up heads" because you can just say "the coin always comes up heads". Optimization will beat alternative mechanical explanations when our ability to perturb a system defeats our ability to predict its interim steps in detail, but not our ability to predict a narrow final outcome. (Again, note that this is an epistemic criterion.)

"(5) Suppose you believe a mind exists, but you don't know its preferences? Then you use some of your evidence to infer the mind's preference ordering, and then use the inferred preferences to infer the mind's power, then use those two beliefs to testably predict future outcomes. The total gain in predictive accuracy should exceed the complexity-cost of supposing that 'there's a mind of unknown preferences around', the initial hypothesis."

Notice that throughout this discussion, what matters is the mind's effect on its environment, not any internal experience of the mind. Unconscious preferences are just as relevant to this method as are conscious preferences, and both are examples of the intentional stance. Note also that you can't really measure the rationality of a system you're modeling in this way; any evidence you raise for 'irrationality' could just as easily be used as evidence that the system has more complicated preferences than you initially thought, or that they're encoded in a more distributed way than you had previously hypothesized.

My take-away from this is that there are two ways we generally think about minds on LessWrong: Rational Choice Theory, on which all minds are equally rational and strange or irregular behaviors are seen as evidence of strange preferences; and what we might call the Ideal Self Theory, on which minds' revealed preferences can differ from their 'true self' preferences, resulting in irrationality. One way of unpacking my idealized values is that they're the rational-choice-theory preferences I would exhibit if my conscious desires exhibited perfect control over my consciously controllable behavior, and those desires were the desires my ideal self would reflectively prefer, where my ideal self is the best trade-off between preserving my current psychology and enhancing that psychology's understanding of itself and its environment.

We care about ideal selves when we think about humans, because we value our conscious, 'felt' desires (especially when they are stable under reflection) more than our unconscious dispositions. So we want to bring our actual behavior (and thus our rational-choice-theory preferences, the 'preferences' we talk about when we speak of an AI) more in line with our phenomenological longings and their idealized enhancements. But since we don't care about making non-person AIs more self-actualized, but just care about how they tend to guide their environment, we generally just assume that they're rational. Thus if an AI behaves in a crazy way (e.g., alternating between destroying and creating paperclips depending on what day of the week it is), it's not because it's a sane rational ghost trapped by crazy constraints. It's because the AI has crazy core preferences.

Where did "models its environment" come from?

If we're talking about the things S optimizes its environment for, not the things S "has in mind", then it would seem that whether S models its environment or not is entirely irrelevant to the conversation.

Yes, in principle. But in practice, a system that doesn't have internal states that track the world around it in a reliable and useable way won't be able to optimize very well for anything particularly unlikely across a diverse set of environments. In other words, it won't be very intelligent. To clarify, this is an empirical claim I'm making about what it takes to be particularly intelligent in our universe; it's not part of the definition for 'intelligent'.

Comment author: TheOtherDave 08 September 2013 06:00:25AM *  1 point [-]

a system that doesn't have internal states that track the world around it in a reliable and useable way won't be able to optimize very well for anything particularly unlikely across a diverse set of environments

Yes, that seems plausible.

I would say rather that modeling one's environment is an effective tool for consistently optimizing for some specific unlikely thing X across a range of environments, so optimizers that do so will be more successful at optimizing for X, all else being equal, but it more or less amounts to the same thing.

But... so what?

I mean, it also seems plausible that optimizers that explicitly represent X as a goal will be more successful at consistently optimizing for X, all else being equal... but that doesn't stop you from asserting that explicit representation of X is irrelevant to whether a system has X as its goal.

So why isn't modeling the environment equally irrelevant? Both features, on your account, are optional enhancements an optimizer might or might not display.

It keeps seeming like all the stuff you quote and say before your last two paragraphs ought to provide an answer that question, but after reading it several times I can't see what answer it might be providing. Perhaps your argument is just going over my head, in which case I apologize for wasting your time by getting into a conversation I'm not equipped for..

Comment author: RobbBB 08 September 2013 08:41:22AM *  0 points [-]

Maybe it will help to keep in mind that this is one small branch of my conversation with Alexander Kruel. Alexander's two main objections to funding Friendly Artificial Intelligence research are that (1) advanced intelligence is very complicated and difficult to make, and (2) getting a thing to pursue a determinate goal at all is extraordinarily difficult. So a superintelligence will never be invented, or at least not for the foreseeable future; so we shouldn't think about SI-related existential risks. (This is my steel-manning of his view. The way he actually argues seems to instead be predicated on inventing SI being tied to perfecting Friendliness Theory, but I haven't heard a consistent argument for why that should be so.)

Both of these views, I believe, are predicated on a misunderstanding of how simple and disjunctive 'intelligence' and 'goal' are, for present purposes. So I've mainly been working on tabooing and demystifying those concepts. Intelligence is simply a disposition to efficiently convert a wide variety of circumstances into some set of specific complex events. Goals are simply the circumstances that occur more often when a given intelligence is around. These are both very general and disjunctive ideas, in stark contrast to Friendliness; so it will be difficult to argue that a superintelligence simply can't be made, and difficult too to argue that optimizing for intelligence requires one to have a good grasp on Friendliness Theory.

Because I'm trying to taboo the idea of superintelligence, and explain what it is about seed AI that will allow it to start recursively improving its own intelligence, I've been talking a lot about the important role modeling plays in high-level intelligent processes. Recognizing what a simple idea modeling is, and how far it gets one toward superintelligence once one has domain-general modeling proficiency, helps a great deal with greasing the intuition pump 'Explosive AGI is a simple, disjunctive event, a low-hanging fruit, relative to Friendliness.' Demystifying unpacking makes things seem less improbable and convoluted.

I mean, it also seems plausible that optimizers that explicitly represent X as a goal will be more successful at consistently optimizing for X, all else being equal... but that doesn't stop you from asserting that explicit representation of X is irrelevant to whether a system has X as its goal.

I think this is a map/territory confusion. I'm not denying that superintelligences will have a map of their own preferences; at a bare minimum, they need to know what they want in order to prevent themselves from accidentally changing their own preferences. But this map won't be the AI's preferences -- those may be a very complicated causal process bound up with, say, certain environmental factors surrounding the AI, or oscillating with time, or who-knows-what.

There may not be a sharp line between the 'preference' part of the AI and the 'non-preference' part. Since any superintelligence will be exemplary at reasoning with uncertainty and fuzzy categories, I don't think that will be a serious obstacle.

Does that help explain why I'm coming from? If not, maybe I'm missing the thread unifying your comments.

Comment author: TheOtherDave 08 September 2013 02:30:33PM *  0 points [-]

I suppose it helps, if only in that it establishes that much of what you're saying to me is actually being addressed indirectly to somebody else, so it ought not surprise me that I can't quite connect much of it to anything I've said. Thanks for clarifying your intent.

For my own part, I'm certainly not functioning here as Alex's proxy; while I don't consider explosive intelligence growth as much of a foregone conclusion as many folks here do, I also don't consider Alex's passionate rejection of the possibility justified, and have had extended discussions on related subjects with him myself in past years. So most of what you write in response to Alex's positions is largely talking right past me.

(Which is not to say that you ought not be doing it. If this is in effect a private argument between you and Alex that I've stuck my nose into, let me know and I'll apologize and leave y'all to it in peace.)

Anyway, I certainly agree that a system might have a representation of its goals that is distinct from the mechanisms that cause it to pursue those goals. I have one of those, myself. (Indeed, several.) But if a system is capable of affecting its pursuit of its goals (for example, if it is capable of correcting the effects of a state-change that would, uncorrected, have led to value drift), it is not merely interacting with maps. It is also interacting with the territory... that is, it is modifying the mechanisms that cause it to pursue those goals... in order to bring that territory into line with its pre-existing map.

And in order to do that, it must have such a mechanism, and that mechanism must be consistently isomorphic to its representations of its goals.

Yes?

Comment author: RobbBB 09 September 2013 05:48:53PM 0 points [-]

Right. I'm not saying that there aren't things about the AI that make it behave the way it does; what the AI optimizes for is a deterministic result of its properties plus environment. I'm just saying that something about the environment might be necessary for it to have the sorts of preferences we can most usefully model it as having; and/or there may be multiple equally good candidates for the parts of the AI that are its values, or their encoding. If we reify preferences in an uncautious way, we'll start thinking of the AI's 'desires' too much as its first-person-experienced urges, as opposed to just thinking of them as the effect the local system we're talking about tends to have on the global system.

Comment author: TheOtherDave 09 September 2013 06:44:10PM 1 point [-]

Hm.

So, all right. Cconsider two systems, S1 and S2, both of which happen to be constructed in such a way that right now, they are maximizing the number of things in their environment that appear blue to human observers, by going around painting everything blue.

Suppose we add to the global system a button that alters all human brains so that everything appears blue to us, and we find that S1 presses the button and stops painting, and S2 ignores the button and goes on painting.

Suppose that similarly, across a wide range of global system changes, we find that S1 consistently chooses the action that maximizes the number of things in its environment that appear blue to human observers, while S2 consistently goes on painting.

I agree with you that if I reify S2's preferences in an uncautious way, I might start thinkng of S2 as "wanting to paint things blue" or "wanting everything to be blue" or "enjoying painting things blue" or as having various other similar internal states that might simply not exist, and that I do better to say it has a particular effect on the global system. S2 simply paints things blue; whether it has the goal of painting things blue or not, I have no idea.

I am far less comfortable saying that S1 has no goals, precisely because of how flexibly and consistently it is revising its actions so as to consistently create a state-change across wide ranges of environments. To use Dennett's terminology, I am more willing to adopt an intentional stance with respect to S1 than S2.

If I've understood your position correctly, you're saying that I'm unjustified in making that distinction... that to the extent that we can say that S1 and S2 have "goals," the word "goals" simply refer to the state changes they create in the world. Initially they both have the goal of painting things blue, but S1's goals keep changing: first it paints things blue, then it presses a button, then it does other things. And, sure, I can make up some story like "S1 maximizes the number of things in its environment that appear blue to human observers, while S2 just paints stuff blue" and that story might even have predictive power, but I ought not fall into the trap of reifying some actual thing that corresponds to those notional "goals".

Am I in the right ballpark?

Comment author: RobbBB 09 September 2013 06:59:42PM *  0 points [-]

I think you're switching back and forth between a Rational Choice Theory 'preference' and an Ideal Self Theory 'preference'. To disambiguate, I'll call the former R-preferences and the latter I-preferences. My R-preferences -- the preferences you'd infer I had from my behaviors if you treated me as a rational agent -- are extremely convoluted, indeed they need to be strongly time-indexed to maintain consistency. My I-preferences are the things I experience a desire for, whether or not that desire impacts my behavior. (Or they're the things I would, with sufficient reflective insight and understanding into my situation, experience a desire for.)

We have no direct evidence from your story addressing whether S1 or S2 have I-preferences at all. Are they sentient? Do they create models of their own cognitive states? Perhaps we have a little more evidence that S1 has I-preferences than that S2 does, but only by assuming that a system whose goals require more intelligence or theory-of-mind will have a phenomenology more similar to a human's. I wouldn't be surprised if that assumption turns out to break down in some important ways, as we explore more of mind-space.

But my main point was that it doesn't much matter what S1 or S2's I-preferences are, if all we're concerned about is what effect they'll have on their environment. Then we should think about their R-preferences, and bracket exactly what psychological mechanism is resulting in their behavior, and how that psychological mechanism relates to itself.

I've said that R-preferences are theoretical constructs that happen to be useful a lot of the time for modeling complex behavior; I'm not sure whether I-preferences are closer to nature's joints.

Initially they both have the goal of painting things blue, but S1's goals keep changing: first it paints things blue, then it presses a button, then it does other things.

S1's instrumental goals may keep changing, because its circumstances are changing. But I don't think its terminal goals are changing. The only reason to model it as having two completely incommensurate goal sets at different times would be if there were no simple terminal goal that could explain the change in instrumental behavior.

Comment author: Vladimir_Nesov 07 September 2013 12:56:02AM *  0 points [-]

A system's goals have to be some event that can be brought about.

This sounds like a potentially confusing level of simplification; a goal should be regarded as at least a way of comparing possible events.

When we're talking about an artificial intelligence's preferences, we're talking about the things it tends to optimize for, not the things it 'has in mind' or the things it believes are its preferences.

Its behavior is what makes its goal important. But in a system designed to follow an explicitly specified goal, it does make sense to talk of its goal apart from its behavior. Even though its behavior will reflect its goal, the goal itself will reflect itself better.

If the goal is implemented as a part of the system, other parts of the system can store some information about the goal, certain summaries or inferences based on it. This information can be thought of as beliefs about the goal. And if the goal is not "logically transparent", that is its specification is such that making concrete conclusions about what it states in particular cases is computationally expensive, then the system never knows what its goal says explicitly, it only ever has beliefs about particular aspects of the goal.

Comment author: RobbBB 07 September 2013 06:51:03PM *  0 points [-]

But in a system designed to follow an explicitly specified goal, it does make sense to talk of its goal apart from its behavior. Even though its behavior will reflect its goal, the goal itself will reflect itself better.

Perhaps, but I suspect that for most possible AIs there won't always be a fact of the matter about where its preference is encoded. The blue-minimizing robot is a good example. If we treat it as a perfectly rational agent, then we might say that it has temporally stable preferences that are very complicated and conditional; or we might say that its preferences change at various times, and are partly encoded, for instance, in the properties of the color-inverting lens on its camera. An AGI's response to environmental fluctuation will probably be vastly more complicated than a blue-minimizer's, but the same sorts of problems arise in modeling it.

I think it's more useful to think of rational-choice-theory-style preferences as useful theoretical constructs -- like a system's center of gravity, or its coherently extrapolated volition -- than as real objects in the machine's hardware or software. This sidesteps the problem of haggling over which exact preferences a system has, how those preferences are distributed over the environment, how to decide between causally redundant encodings which is 'really' the preference encoding, etc. See my response to Dave.

Comment author: Vladimir_Nesov 07 September 2013 08:17:40PM 2 points [-]

"Goal" is a natural idea for describing AIs with limited resources: these AIs won't be able to make optimal decisions, and their decisions can't be easily summarized in terms of some goal, but unlike the blue-minimizing robot they have a fixed preference ordering that doesn't gradually drift away from what it was originally, and eventually they tend to get better at following it.

For example, if a goal is encrypted, and it takes a huge amount of computation to decrypt it, system's behavior prior to that point won't depend on the goal, but it's going to work on decrypting it and eventually will follow it. This encrypted goal is probably more predictive of long-term consequences than anything else in the details of the original design, but it also doesn't predict its behavior during the first stage (and if there is only a small probability that all resources in the universe will allow decrypting the goal, it's probable that system's behavior will never depend on the goal). Similarly, even if there is no explicit goal, as in the case of humans, it might be possible to work with an idealized goal that, like the encrypted goal, can't be easily evaluated, and so won't influence behavior for a long time.

My point is that there are natural examples where goals and the character of behavior don't resemble each other, so that each can't be easily inferred from the other, while both can be observed as aspects of the system. It's useful to distinguish these ideas.

Comment author: RobbBB 07 September 2013 09:37:18PM *  0 points [-]

I agree preferences aren't reducible to actual behavior. But I think they are reducible to dispositions to behave, i.e., behavior across counterfactual worlds. If a system prefers a specific event Z, that means that, across counterfactual environments you could have put it in, the future would on average have had more Z the more its specific distinguishing features had a large and direct causal impact on the world.

Comment author: Vladimir_Nesov 08 September 2013 01:06:05PM *  0 points [-]

The examples I used seem to apply to "dispositions" to behave, in the same way (I wasn't making this distinction). There are settings where the goal can't be clearly inferred from behavior, or collection of hypothetical behaviors in response to various environments, at least if we keep environments relatively close to what might naturally occur, even as in those settings the goal can be observed "directly" (defined as an idealization based in AI's design).

An AI with encypted goal (i.e. the AI itself doesn't know the goal in explicit form, but the goal can be abstractly defined as the result of decryption) won't behave in accordance with it in any environment that doesn't magically let it decrypt its goal quickly, there is no tendency to push the events towards what the encrypted goal specifies, until the goal is decrypted (which might be never with high probability).

Comment author: RobbBB 09 September 2013 05:59:32PM *  0 points [-]

I don't think a sufficiently well-encrypted 'preference' should be counted as a preference for present purposes. In principle, you can treat any physical chunk of matter as an 'encrypted preference', because if the AI just were a key of exactly the right shape, then it could physically interact with the lock in question to acquire a new optimization target. But if neither the AI nor anything very similar to the AI in nearby possible worlds actually acts as a key of the requisite sort, then we should treat the parts of the world that a distant AI could interact with to acquire a preference as, in our world, mere window dressing.

Perhaps if we actually built a bunch of AIs, and one of them was just like the others except where others of its kind had a preference module, it had a copy of The Wind in the Willows, we would speak of this new AI as having an 'encrypted preference' consisting of a book, with no easy way to treat that book as a decision criterion like its brother- and sister-AIs do for their homologous components. But I don't see any reason right now to make our real-world usage of the word 'preference' correspond to that possible world's usage. It's too many levels of abstraction away from what we should be worried about, which are the actual real-world effects different AI architectures would have.