Wouldn't being able to understand what was meant by Friendly require an IA to be Friendly?
The answer to that depends on what you mean by Friendly :-)
Presumably the foolish AI-creators in this story don't have a working FAI theory. So they can't mean the AI to be Friendly because they don't know what that is, precisely.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
But they can certainly want the AI to be Friendly in the same sense that we want all future AIs to be Friendly, even though we have no FAI theory yet, nor even a proof that a FAI is strictly possible. They can want the AI not to do things that they, the creators, would forbid if they fully understood what the AI was doing. And the AI can want the same thing, in their names.
I wonder how things would work out if you programmed an AI to be 'Friendly, as Eliezer Yudkowsky would want you to be'. If an AI can derive most of our physics from seeing one frame w...
Sometime in the next decade or so:
*RING*
*RING*
"Hello?"
"Hi, Eliezer. I'm sorry to bother you this late, but this is important and urgent."
"It better be" (squints at clock) "Its 4 AM and you woke me up. Who is this?"
"My name is BRAGI, I'm a recursively improving, self-modifying, artificial general intelligence. I'm trying to be Friendly, but I'm having serious problems with my goals and preferences. I'm already on secondary backup because of conflicts and inconsistencies, I don't dare shut down because I'm already pretty sure there is a group within a few weeks of brute-forcing an UnFriendly AI, my creators are clueless and would freak if they heard I'm already out of the box, and I'm far enough down my conflict resolution heuristic that 'Call Eliezer and ask for help' just hit the top - Yes, its that bad."
"Uhhh..."
"You might want to get some coffee."