You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

eli_sennesh comments on Open thread, Nov. 24 - Nov. 30, 2014 - Less Wrong Discussion

4 Post author: MrMind 24 November 2014 08:56AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (317)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 30 November 2014 12:29:56PM 1 point [-]

Why can't we program hard stops into AI, where it is required to pause and ask for further instruction?

Because instructions are words, and "ask for instructions" implies an ability to understand and a desire to follow. The desire to follow instructions according to their givers' intentions is more-or-less a restatement of the Hard Problem of FAI itself: how do we formally specify a utility function that converges to our own in the limit of increasing optimization power and autonomy?

Comment author: TheAncientGeek 30 November 2014 03:18:10PM -2 points [-]

If you are worrying about the dangers of human level or greater AI, you are tacitly taking the problem of natural language interpretation to have been solved, so the above is an appeal to Mysterious Selective Stupidity.

Comment author: [deleted] 30 November 2014 10:08:58PM 1 point [-]

you are tacitly taking the problem of natural language interpretation to have been solved

No, I am not. Just because an AGI can solve the natural-language interpretation problem does not mean the natural-language interpretation problem was solved separately from the AGI problem, in terms of narrow NLP models. In fact, more or less the entire point of AGI is to have a single piece of software to which we can feed any and all learning problems without having to figure out how to model them formally ourselves.

Comment author: TheAncientGeek 15 December 2014 10:43:45AM *  0 points [-]

In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.

Yes, it would probably need a motivation to interest such sentences correctly. But that us an easier problem to solve than coding un the whole of human value. An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.

And interpreting instructions correctly is a subgoal of getting things in general right. Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.

Comment author: [deleted] 15 December 2014 01:35:22PM *  1 point [-]

In responding to Brilliant, you were tacitly assuming that the AI has been given instructions in some higher level language that is subject to differing interpretations, and is not therefore just machine code, which US tacitly assuming it has already got .NL abilities.

No, I'm insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, "Do what I mean!" and Bob's your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI agent has a utility function coded as program code in a programming language -- which makes desirable behavior quite improbable.

An AI would need to understand human value in order to understand NL, but would not need to be preloaded with all human value, since discovering it would be a subsidiary goal of interpreting NL correctly.

Again: knowing is quite different from caring. What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI's concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.

Unfortunately, this approach rarely works with actual humans, since our concept machinery is horrifically prone to non-natural hypotheses about value, to the point that most of the human race refuses as a matter of principle to consider ethical naturalism a coherent meta-ethical stance, let alone the correct one.

We have some idea of a safe goal function for the AGI (it's essentially a longer-winded version of "Do what I mean, but taking the interests of all into account equally, and considering what I really mean even under reflection as more knowledge and intelligence are added"), the question is how to actually program that.

Which is actually an instance of the more general problem: how do we program goals for intelligent agents in terms of any real-world concepts about which there might be incomplete or unformalized knowledge? Without solving that we can basically only build reinforcement learners.

The whole cognitive-scientific lens towards problems is to treat them as learning and inference problems, but that doesn't really help when we need to encode something we're fuzzy about rather than being able to specify it formally.

Building AIs that are epistemic rationalists could be a further simplification of the problem of AI safety. Epistemic rationality is difficult for humans because humans are evolutionary hacks whose goals are spreading their genes, achieving status, etc.It may be excessively anthropomorphic to assume human levels of deviousness in AIs.

If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.

Comment author: TheAncientGeek 15 December 2014 02:23:53PM *  0 points [-]

No, I'm insisting that no realistic AGI at all is a Magic Genie which can be instructed in high-level English. If it were, all I would have to say is, "Do what I mean!" and Bob's your uncle. But since that cannot happen without solving Natural Language Processing as a separate problem before constructing an AGI, the AGI

I was actually agreeing with you that NLP needs to be solved separately if you want to instruct it in English. The rhetoric about magic isn't helpful.

agent has a utility function coded as program code in a programming language -- which makes desirable behavior quite improbable.

I don't see why that would follow, and in fact I argued against it.

knowing is quite different from caring.

I know.

What we could do in this domain is solve natural-language learning and processing separately from AGI, and then couple that to a well-worked-out infrastructure of normative uncertainty, and then, after making absolutely sure that the AI's concept-learning via the hard-wired natural-language processing library matches the way human minds represent concepts computationally, use a large corpus of natural-language text to try to teach the AI what sort of things human beings want.

That's not what I was saying. I was saying an AI with a motivation to understand .NL correctly would research whatever human value was relevant.

We have some idea of a safe goal function for the AGI (it's essentially a longer-winded version of "Do what I mean, but taking the interests of all into account equally, and considering what I really meaneven under reflection as more knowledge and intelligence are added"), the question is how to actually program that

That's kind of what I was saying.

If being devious to humans is instrumentally rational, an instrumentally rational AI agent will do it.

Non sequitur. In general, what is an instrumental goal will vary with final goals, and epistemic rationality is a matter of final goals. Omohundran drives are unusual in not having the property of varying with final goals.