As to not understanding the argument - that's understandable, because this is a long and dense paper.
If you are trying to summarize the whole paper when you say "if we succeed to make the Friendly AI perfectly on the first attempt, then we do not have to worry about what could go wrong, because the perfect Friendly AI would not do anything stupid", then that would not be right. The argument includes a statement that resembles that, but only as an aside.
As to your question about what happens next, or what happens if we only get the "Friendly" part 90% correct .... well, you are dragging me off into new territory, because that was not really within the scope of the paper. Don't get me wrong: I like being dragged off into that territory! But there just isn't time to write down and argue the whole domain of AI friendliness all in one sitting.
The preliminary answer to that question is that everything depends on the details of the motivation system design and my feeling (as a designer of AGI motivation systems) is that beyond a certain point the system is self-stabilizing. That is, it will understand its own limitations and try to correct them.
But that last statement tends to get (some other) people inflamed, because they do not realize that it comes within the "swarm relaxation" context, and they misunderstand the manner in which a system would self correct. Although I said a few things about swarm relaxation in the paper, I did not give enough detail to be able to address this whole topic here.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
As you wrote, the second point filled in the missing part from the first: it uses its background contextual knowledge.
You say you are unsure what this means. That leaves me a little baffled, but here goes anyway. Suppose I asked a person, today, to write a book for me on the subject of "What counts as an action that is significant enough that, if you did that action in a way that it would affect people, it would rise above some level of "nontrivialness" and you should consult them first? Include in your answer a long discussion of the kind of thought processes you went through to come up with your answers" I know many articulate people who could, if they had the time, write a massive book on that subject.
Now, that book would contain a huge number of constraints (little factoids about the situation) about "significant actions", and the SOURCE of that long list of constraints would be .... the background knowledge of the person who wrote the book. They would call upon a massive body of knowledge about many aspects of life, to organize their thoughts and come up with the book.
If we could look into the head of the person who wrote the book we could find that background knowledge. It would be similar in size to the number of constraints mentioned in the book, or it woudl be larger.
That background knowledge -- both its content AND its structure -- is what I refer to when I talk about the AI using contextual information or background knowledge to assess the degree of significance of an action.
You go on to ask a bizarre question:
This would be an example of an intelligent system sitting there with that massive array of contextual/background knowledge that could be deployed ...... but instead of using that knowledge to make a preliminary assessement of whether "shooting first" would be a good idea, it ignores ALL OF IT and substitutes one single constraint taken from its knowldege base or its goal system:
It would entirely defeat the object of using large numbers of constraints in the system, to use only one constraint. The system design is (assumed to be) such that this is impossible. That is the whole point of the Swarm Relaxation design that I talked about.
My bizarre question was just an illustrative example. It seems neither you nor I believe that would be an adequate criterion (though perhaps for different reasons).
If I may translate what you're saying into my own terms, you're saying that for a problem like "shoot first or ask first?" the criteria (i.e., constraints) would be highly complex and highly contextual. Ok. I'll grant that's a defensible design choice.
Earlier in the thread you said
This is why I have honed in on scenarios where the AI has not yet received feedback on its plan. In these scenarios, the AI presumably must decide (even if the decision is only implicit) whether to consult humans about its plan first, or to go ahead with its plan first (and halt or change course in response to human feedback). To lay my cards on the table, I want to consider three possible policies the AI could have regarding this choice.
Can you let me know: have I understood you correctly? More importantly, do you agree with my framing of the dilemma for the AI? Do you agree with my assessment of the pitfalls of each of the 3 policies?