Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

fractalman comments on Failed Utopia #4-2 - Less Wrong

52 Post author: Eliezer_Yudkowsky 21 January 2009 11:04AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (248)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: fractalman 30 May 2013 03:42:10AM *  11 points [-]

|I wish that the future will turn out in such a way that I do not regret making this wish

... wish granted. the genie just removed the capacity for regret from your mind. MWAHAHAH!

Comment author: Eliezer_Yudkowsky 30 May 2013 05:18:11AM 9 points [-]

Easier to do by just squishing someone, actually.

Comment author: Will_Newsome 30 May 2013 06:09:17AM *  6 points [-]

If a genie cares enough about your request to interpret and respond to its naive denotation, it also cares enough to interpret your request's obvious connotations. The apparently fine line between them is a human construction. Your proposed interpretation only makes sense if the genie is a rules-lawyer with at-least-instrumentally-oppositional interests/incentives, in which case one wonders where those oppositional interests/incentives came from. (Which is where we're supposed to bring in Omohundro et cetera but meh.)

Comment author: ciphergoth 30 May 2013 06:28:12AM 16 points [-]

Right, if you want a world that's all naive denotation, zero obvious connotation, that's computer programming!

Comment author: wedrifid 30 May 2013 08:42:21AM 4 points [-]

If a genie cares enough about your request to interpret and respond to its naive denotation, it also cares enough to interpret your request's obvious connotations.

That doesn't follow. There just isn't any reason that the former implies the latter. Either kind of caring is possible but they are not the same thing (and the second is likely more complex than the first).

Your proposed interpretation only makes sense if the genie is a rules-lawyer

This much is true. (Or at least it must be something that follows rules.)

with at-least-instrumentally-oppositional interests/incentives

This isn't required. It need no oppositional interests/incentives at all beyond, after they are given a request, the desire to honour it. This isn't a genie trying to thwart someone in order to achieve some other goal. It is just the genie trying to the intent in order to for some other purpose. It is a genie only caring about the request and some jackass asking for something they don't want. (Rather than 'oppositional' it could be called 'obedient', where it turns out that isn't what is desired.)

in which case one wonders where those oppositional interests/incentives came from.

Presumably it got it's wish granting motives from whoever created it or otherwise constructed the notion of the wish granter genie.

Comment author: Kawoomba 30 May 2013 08:48:31AM 0 points [-]

Presumably it got it's wish granting motives from whoever created it or otherwise constructed the notion of the wish granter genie.

Why would there be some creating agency involved any more than we need a "whoever" to explain where human characteristics come from?

Comment author: Will_Newsome 30 May 2013 09:01:15AM *  10 points [-]

There just isn't any reason that the former implies the latter. Either kind of caring is possible but they are not the same thing (and the second is likely more complex than the first).

(Very hastily written:) The former doesn't imply the latter, it's just that both interpreting denotation and interpreting connotation are within an order of magnitude as difficult as each other and they aren't going to be represented by a djinn or an AGI as two distinct classes of interpretation, there's no natural boundary between them. I mean I guess the fables can make the djinns weirdly stunted in that way, but then the analogy to AGIs breaks down, because interpreting denotation but not connotation is unnatural and you'd have to go out of your way to make an AGI that does that. By hypothesis the AGI is already interpreting natural speech, not compiling code. I mean you can argue that denotation and connotation actually are totally different beasts and we should expect minds-in-general to treat them that way, but my impression is that what we know of linguistics suggests that isn't the case. (ETA: And I mean even just interpreting the "denotation" requires a lot of context already, obviously; why are we taking that subset of context for granted while leaving out only the most important context? Makes sense for a moralistic djinn fable, doesn't make sense by analogy to AGI.) (ETA2: Annoyed that this purely epistemic question is going to get bogged down in and interpreted in the light of political boo- / yay-AI-risk-prevention stances, arguments-as-soldiers style.)

Comment author: wedrifid 31 May 2013 08:57:52AM *  2 points [-]

The former doesn't imply the latter, it's just that both interpreting denotation and interpreting connotation are within an order of magnitude as difficult as each other

This much is true. It is somewhat more difficult to implement a connotation honoring genie (because that requires more advanced referencing and interpretation) but both tasks fall under already defined areas of narrow AI. The difference in difficulty is small enough that I more or less ignore it as a trivial 'implementation detail'. People could create (either as fiction or as AI) either of these things and each have different problems.

Annoyed that this purely epistemic question is going to get bogged down in and interpreted in the light of political boo- / yay-AI-risk-prevention stances, arguments-as-soldiers style.

Your mind reading is in error. To be honest this seems fairly orthogonal to AI-risk-prevention stances. From what I can tell someone with a particular AI stance hasn't got an incentive either way because both these types of genie are freaking dangerous in their own way. The only difference acknowledging the possibility of connotation honouring genies makes is perhaps to determine which particular failure mode you potentially end up in. Having a connotation honouring genie may be an order of magnitude safer than a literal genie but unless there is almost-FAI-complete code in there in the background as a a safeguard it's still something I'd only use if I was absolutely desperate. I round off the safety difference between the two to negligible in approximately the same way I round off the implementation difficulty difference.

As a 'purely epistemic question' your original claim is just plain false. However, as another valid point that is somewhat which we have both skirted around the edges of explaining adequately. I (think that I) more or less agree with what you are saying in this follow up comment. I suggest that the main way that AI interest influence this conversation is that it promotes (and is also caused by) interest in being accurate about precisely what the expected outcomes of goal systems are and just what the problems of a given system happen to be.

Comment author: Will_Newsome 31 May 2013 12:26:43PM *  4 points [-]

Your mind reading is in error.

Sorry, didn't mean to imply you'd be the one mind-killed, just the general audience. From previous interactions I know you're too rational for that kind of perversion.

Having a connotation honouring genie may be an order of magnitude safer than a literal genie

I actually think it's many, many orders of magnitude safer, but that's only because a denotation honoring genie is just egregiously stupid. A connotation honoring genie still isn't safe unless "connotation-honoring" implies something at least as extensive and philosophically justifiable as causal validity semantics. I honestly expect the average connotation-honoring genie will lie in-between a denotation-honoring genie and a bona fide justifiable AGI—i.e., it will respect human wishes about as much as humans respect, say, alligator wishes, or the wishes of their long-deceased ancestors. On average I expect an Antichrist, not a Clippy. But even if such an AGI doesn't kill all of us and maybe even helps us on average, the opportunity cost of such an AGI is extreme, and so I nigh-wholeheartedly support the moralistic intuitions that traditionally lead people to use djinn analogies. Still, I worry that the underlying political question really is poisoning the epistemic question in a way that might bleed over into poor policy decisions re AGI. (Drunk again, apologies for typos et cetera.)

Comment author: wedrifid 31 May 2013 04:47:49PM -1 points [-]

Sorry, didn't mean to imply you'd be the one mind-killed, just the general audience. From previous interactions I know you're too rational for that kind of perversion.

Thank you for your generosity but in all honesty I have to deny that. I at times notice in myself the influence of social political incentives. I infer from what I do notice (and, where appropriate, resist) that there are other influences that I do not detect.

I honestly expect the average connotation-honoring genie will lie in-between a denotation-honoring genie and a bona fide justifiable AGI—i.e., it will respect human wishes about as much as humans respect, say, alligator wishes, or the wishes of their long-deceased ancestors.

That seems reasonable.

But even if such an AGI doesn't kill all of us and maybe even helps us on average, the opportunity cost of such an AGI is extreme, and so I nigh-wholeheartedly support the moralistic intuitions that traditionally lead people to use djinn analogies.

I agree that there is potentially significant opportunity cost but perhaps if anything it sounds like I may be more willing to accept this kind of less-than-ideal outcome. For example if right now I was forced to make a choice whether to accept this failed utopia based on a fully connotative honoring artificial djinn or to leave things exactly as they are I suspect I would accept it. It fails as a utopia but it may still be better than the (expected) future we have right now.

Comment author: TheDude 31 May 2013 08:12:32PM 2 points [-]

I think you have a point Will (an AI that interprets speech like a squish djinn would require deliberate effort and is proposed by no one), but I think that it is possible to construct a valid squish djinn/AI analogy (a squish djinn interpreting a command would be roughly analogous to an AI that is hard coded to execute that command).

Sorry to everyone for the repetitive statements and the resulting wall of text (that unexpectedly needed to be posted as multiple comments since it was to long). Predicting how people will interpret something is non trivial, and explaining concepts redundantly is sometimes a useful way of making people hear what you want them to hear.

Squish djinn is here used to denote a mind that honestly believes that it was actually instructed to squish the speaker (in order to remove regret for example), not a djinn that wants to hurt the speaker and is looking for a loophole. The squish djinn only care about doing what it is requested to do, and does not care at all about the well being of the requester, so it could certainly be referred to as hostile to the speaker (since it will not hesitate to hurt the speaker in order to achieve its goal (of fulfilling the request)). A cartoonish internal monologue of the squish djinn would be: "the speaker clearly does not want to be squished, but I don't care what the speaker wants, and I see no relation between what the speaker wants and what it is likely to request, so I determine that the speaker requested to be squished, so I will squish" (which sounds very hostile, but contains no will to hurt the speaker). The typical story djinn is unlikely to be a squish djinn (they usually have a motive to hurt or help the speaker, but is restricted by rules (a clever djinn that wants to hurt the speaker might still squish, but not for the same reasons as a squish djinn (such a djinn would be a valid analogy when opposing a proposal of the type "lets build some unsafe mind with selfish goals and impose rules on it" (such a project can never succeed, and the proposer is probably fundamentally confused, but a simple and correct and sufficient counter argument is: "if the project did succeed, the result would be very bad")))).

Comment author: TheDude 31 May 2013 08:12:57PM 0 points [-]

To expand on you having a point. I have obviously not seen every AI proposal on the internet, but as far as I know, no one is proposing to build a wish granting AI that parses speech like a squish djinn (and ending up with such an AI would require a deliberate effort). So I don't think the squish djinn is a valid argument against proposed wish granting AIs. Any proposed or realistic speech interpreting AI would (as you say) parse english speech as english speech. An AI that makes arbitrary distinctions between different types of meaning would need serious deliberate effort, and as far as I know, no one is proposing to do this. This makes the squish djinn analogy invalid as an argument against proposals to build a wish granting AI. It is a basic fact that statements does not have specified "meanings" attached to them, and AI proposals takes this into account. To take an extreme example to make this very clear would be Bill saying: "Steve is an idiot" to two listeners where one listener will predictably think of one Steve and the other listener will predictable think of some other Steve (or a politician making a speech that different demographics will interpret differently and to their own liking). Bill (or the politician) does not have a specific meaning of which Steve (or which message) they are referring to. This speaker is deliberately making a statement in order to have different effects on different audiences. Another standard example is responding to a question about the location of an object with: "look behind you" (anyone that is able to understand english and has no serious mental deficiencies would be able to guess that the meaning is that the object is/might be behind them (as opposed to following the order and be surprised to see the object lying there and think "what a strange coincidence")). Building an AI that would parse "look behind you" without understanding that the person is actually saying "it is/might be behind you" would require deliberate effort as it would be necessary to painstakingly avoid using most information while trying to understand speech. Tone of voice, body language, eye gaze, context, prior knowledge of the speaker, models of people in general, etc, etc all provide valuable information when parsing speech. And needing to prevent an AI from using this information (even indirectly, for example through models of "what sentences usually mean") would put enormous additional burdens on an AI project. An example in the current context would be writing: "It is possible to communicate in a way so that one class of people will infer one meaning and take the speaker seriously and another class of people will infer another meaning and dismiss it as nonsense. This could be done by relying on the fact that people differ in their prior knowledge of the speaker and in their ability to understand certain concepts. One can use non standard vocabulary, take non standard strong positions, describe non common concepts, or otherwise give signals indicating that the speaker is a person that should not be taken seriously so that the speaker is dismissed by most people as talking nonsense. But people that knows the speaker would see a discrepancy and look closer (and if they are familiar with the non standard concepts behind all the "don't listen to me" signs they might infer a completely different message).".

To expand on the valid AI squish djinn analogy. I think that hard coding an AI that executes a command is practically impossible. But if it did succeeded, it would act sort of like a squish djinn given that command. And this argument/analogy is a valid and sufficient argument against trying to hard code such a command, making it relevant as long as there exists people that propose to hardcode such commands. If someone tried to hardcode an AI to execute such a command, and they succeeded in creating something that had a real world impact, I predict this represents a failure to implement the command (it would result in an AI that does something other than the squish djinn and something other than what the builders expect it to do). So the squish djinn is not a realistic outcome. But it is what would happen if they succeeded, and thus the squish djinn analogy is a valid argument against "command hard coding" projects. I can't predict what such an AI would actually do since that depends on how the project failed. Intuitively the situation where confused researchers fail to build a squish djinn does not feel very optimal, but making an argument on this basis is more vague, and require that the proposing researchers accepts their own limited technical ability (saying "doing x is clearly technically possible, but you are not clever enough to succeed" to the typical enthusiastic project proposer (that considers themselves to be clever enough to maybe be the first in the world to create a real AI) might not be the most likely argument to succeed (here I assume that the intent is to be understood, and not to lay the groundworks for later smugly saying "I pointed that out a long time ago" (if one later wants to be smug, then one should optimize for being loud, taking clear and strong positions, and not being understood))). The squish djinn analogy is simply a simpler argument. "Either you fail or you get a squish djinn" is true and simple and sufficient to argue against a project. When presenting this argument, you do spend most of the time arguing about what would happen in a situation that will never actually happen (project success). This might sound very strange to an outside observer, but the strangeness is introduced by the project proposers (invalid) assumption that the project can succeed (analogous to some atheist saying: "if god exists, and is omnipotent, then he is not nice, cuz there is suffering").

Comment author: TheDude 31 May 2013 08:13:23PM 0 points [-]

(I'm arrogantly/wisely staying neutral on the question of whether or not it is at all useful to in any way engage with the sort of people whose project proposals can be validly argued against using squish djinn analogies)

(jokes often work by deliberately being understood in different ways at different times by the same listener (the end of the joke deliberately changes the interpretation of the beginning of the joke (in a way that makes fun of someone)). In this case the meaning of the beginning of the joke is not one thing or the other thing. The listener is not first failing to understand what was said and then, after hearing the end, succeeding to understand it. The speaker is intending the listener to understand the first meaning until reaching the end, so the listener is not "first failing to encode the transmission". There is no inherently true meaning of the beginning of the joke, no inherently true person that this speaker is actually truly referring to. Just a speaker that intends to achieve certain effects on an audience by saying things (and if the speaker is successful, then at the beginning of the joke the listener infers a different meaning from what it infers after hearing the end of the joke). One way to illuminate the concepts discussed above would be to write: "on a somewhat related note, I once considered creating the username "New_Willsome" and to start posting things that sounded like you (for the purpose of demonstrating that if you counter a ban by using sock puppets, you loose your ability to stop people from speaking in your name (I was considering the options of actually acting like I think you would have acted, and the option of including subtle distortions to what I think you would have said, and the option of doing my best to give better explanations of the concepts that you talk about)). But then a bunch of usernames similar to yours showed up and were met with hostility, and I was in a hurry, and drunk, and bat shit crazy, and God told me not to do it, and I was busy writing fanfic, so I decided not to do it (the last sentence is jokingly false. I was not actually in a hurry ... :) ... )")

Comment author: MugaSofer 30 May 2013 10:29:13AM *  4 points [-]

Actually, I think Will has a point here.

"Wishes" are just collections of coded sounds intended to help people deduce our desires. Many people (not necessarily you, IDK) seem to model the genie as attempting to attack us while maintaining plausible deniability that it simply misinterpreted our instructions, which, naturally, does occasionally happen because there's only so much information in words and we're only so smart.

In other words, it isn't trying to understand what we mean; it's trying to hurt us without dropping the pretense of trying to understand what we mean. And that's pretty anthropomorphic, isn't it?

Comment author: private_messaging 30 May 2013 12:34:32PM *  4 points [-]

Yes, that's the essence of it. People do it all the time. Generally, all sorts of pseudoscientific scammers try to maintain image of honest self deception; in the medical scams in particular, the crime is just so heinous and utterly amoral (killing people for cash) that pretty much everyone goes well out of their way to be able to pretend at ignorance, self deception, misinterpretation, carelessness and enthusiasm. But why would some superhuman AI need plausible deniability?

Comment author: nshepperd 30 May 2013 02:08:31PM 3 points [-]

If your genie is using your vocal emissions as information toward the deduction of your extrapolated volition, then I'd say your situation is good.

Your problems start if it works more by attempting to extract a predicate from your sentence by matching vocal signals against known syntax and dictionaries, and output an action that maximises the probability of that predicate being true with respect to reality.

To put it simply, I think that "understanding what we mean" is really a complicated notion that involves knowing what constitutes true desires (as opposed to, say, akrasia), and of course having a goal system that actually attempts to realize those desires.

Comment author: ThrustVectoring 04 June 2013 07:31:08PM 1 point [-]

at-least-instrumentally-oppositional interests/incentives, in which case one wonders where those oppositional interests/incentives came from.

All you need is a cost function. If the genie prefers achieving goals sooner rather than later, squishing you is a 'better' solution along that direction to remove your capacity for regret. Or if it prefers using less effort rather than more. Etc.

Comment author: fractalman 30 May 2013 06:32:48AM 2 points [-]

Well, yes, that is one way to remove the capacity for regret...

I mentally merged the possibility pump and the Mehtopia AI....say, a sloppy code mistake, or a premature compileandrun, resulting in the "do not tamper with minds" rule not getting incorporated correctly, even though "don't kill humans" gets incorporated.

Comment author: AlexanderRM 08 October 2015 09:24:32PM 0 points [-]

I assume what Will_Pearson meant to say was "would not regret making this wish", which fits with the specification of "I is the entity standing here right now". Basically such that: if before finishing/unboxing the AI, you had known exactly what would result from doing so, you would still have built the AI. (and it's supposed the find out of that set of possibly worlds the one you would most like, or... something along those lines)) I'm not sure that would rule out every bad outcome, but... I think it probably would. Besides the obvious "other humans have different preferences from the guy building the AI"- maybe the AI is ordered to do a similar thing for each human individually- can anyone think of ways this would go badly?