You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

lfghjkl comments on Goal retention discussion with Eliezer - Less Wrong Discussion

56 Post author: MaxTegmark 04 September 2014 10:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (26)

You are viewing a single comment's thread. Show more comments above.

Comment author: Matthew_Opitz 05 September 2014 01:34:23AM 3 points [-]

Okay, wow, I don't know if I quite understand any of this, but this part caught my attention:

The Omohundrian/Yudkowskian argument is not that we can take an arbitrary stupid young AI and it will be smart enough to self-modify in a way that preserves its values, but rather that most AIs that don't self-destruct will eventually end up at a stable fixed-point of coherent consequentialist values. This could easily involve a step where, e.g., an AI that started out with a neural-style delta-rule policy-reinforcement learning algorithm, or an AI that started out as a big soup of self-modifying heuristics, is "taken over" by whatever part of the AI first learns to do consequentialist reasoning about code.

I have sometimes wondered whether the best way to teach an AI a human's utility function would not be to program it into the AI directly (because that will require that we figure out what we really want in a really precisely-defined way, which seems like a gargantuan task), but rather, perhaps the best way would be to "raise" the AI like a kid at a stage where the AI would have minimal and restricted ways of interacting with human society (to minimize harm...much like a toddler thankfully does not have the muscles of Arnold Schwarzenegger to use during its temper tantrums), and where we would then "reward" or "punish" the AI for seeming to demonstrate better or worse understanding of our utility function.

It always seemed to me that this strategy had the fatal flaw that we would not be able to tell if the AI was really already superintelligent and was just playing dumb and telling us what we wanted to hear so that we would let it loose, or if the AI really was just learning.

In addition to that fatal flaw, it seems to me that the above quote suggests another fatal flaw to the "raising an AI" strategy—that there would be a limited time window in which the AI's utility function would still be malleable. It would appear that, as soon as part of the AI figures out how to do consequentialist reasoning about code, then its "critical period" in which we could still mould its utility function would be over. Is this the right way of thinking about this, or is this line of thought waaaay too amateurish?

Comment author: lfghjkl 05 September 2014 07:01:03PM 6 points [-]

Very relevant article from the sequences: Detached Lever Fallacy.

Not saying you're committing this fallacy, but it does explain some of the bigger problems with "raising an AI like a child" that you might not have thought of.

Comment author: ciphergoth 06 September 2014 12:41:08PM 4 points [-]

I completely made this mistake right up until the point I read that article.

Comment author: TheAncientGeek 06 September 2014 03:51:24PM -1 points [-]

Hardly dispositive. A utility function that says "learn and care what your parents care about" looks relatively simple on paper. And we know the minumum intelligence required is that of a human toddler,

Comment author: VAuroch 06 September 2014 08:59:31PM 1 point [-]

A utility function that says "learn and care what your parents care about" looks relatively simple on paper.

Citation needed. That sounds extremely complex to specify.

Comment author: TheAncientGeek 06 September 2014 09:09:26PM -1 points [-]

relatively

Comment author: VAuroch 06 September 2014 10:53:06PM 1 point [-]

I don't think "learn and care about what your parents care about" is noticeably simpler than abstractly trying to determine what an arbitrary person cares about or CEV.