Eliezer_Yudkowsky comments on The Hidden Complexity of Wishes - Less Wrong

58 Post author: Eliezer_Yudkowsky 24 November 2007 12:12AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (121)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 31 August 2013 09:39:33PM 2 points [-]

The trouble is that communicating with a human or helping them build the real FAI in any way is going to strongly perturb the world. So actually getting anything useful this way requires solving the problem of which changes to humans, and consequent changes to the world, are allowed to result from your communication-choices.

Comment author: Kawoomba 31 August 2013 09:55:00PM *  0 points [-]

helping them build the real FAI

Except it's not, as far as the artificial agent is concerned:

Its goals are strictly limited to "develop your models using the minimal actions possible [even 'just parse the internet, do not use anything beyond wget' could suffice], after x number of years have passed, accept new goals from y source." The new goals could be anything. (It could even be a boat!).

The usefulness regarding FAI becomes evident only at that latter stage, stemming from the foom'ed AI's models being used to parse the new goals of "do that which I'd want you to do". It's sidestepping the big problem (aka "cheating"), but so what?

Comment author: Eliezer_Yudkowsky 31 August 2013 11:28:12PM 5 points [-]

It's allowed to emit arbitrary HTTP GETs? You just lost the game.

Comment author: Kawoomba 01 September 2013 06:41:15AM *  0 points [-]

Ah, you mean because you can invoke e.g. php functions with wget / inject SQL code, thus gaining control of other computers etc.?

A more sturdy approach to just get data would be to only allow it to passively listen in on some Tier 1 provider's backbone (no manipulation of the data flow other than mirroring packets, which is easy to formalize). Once that goal is formulated, the agent wouldn't want to circumvent it.

Still seems plenty easier to solve than "friendliness", as is programming it to ask for new goals after x time. Maintaining invariants under self-modification remains, as a task.

It's not fruitful for me to propose implementations (even though I just did, heh) and for someone else to point out holes (I don't mean to solve that task in 5 minutes), same as with you proposing full-fledged implementations for friendliness and for someone else to point out holes. Both are non-trivial tasks.

My question is this: given your current interpretation of both approaches ("passively absorb data, ask for new goals after x time" vs. "implement friendliness in the pre-foomed agent outright"), which seems more manageable while still resulting in an FAI?