ChristianKl comments on Open thread, September 2-8, 2013 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (376)
You basically say that the AI should be unable to learn to trust a process that was effective in the past to also be effective in the future. I think that would restrict intelligence a lot.
Yeah, that's a good point. What I want to say is, "oh, a non-self-modifying AI would still be able to hand off control to a sub-AI, but it will automatically check to make sure the sub-AI is behaving correctly; it won't be able to turn off those checks". But my idea here is definitely starting to feel more like a pipe dream.
Hmm, might still be something gleaned for attempting to steelman this or work in different related directions.
Edit; maybe something with an AI not being able to tolerate things it can't make certain proofs about? Problem is it'd have to be able to make those proofs about humans if they are included in its environment, and if they are not it might make UFAI there (Intuition pump; a system that consists of a program it can prove everything about, and humans that program asks questions to). Yea this doesn't seem very useful.
You can't really tell whether something that is smarter than yourself is behaving correctly. In the end a non-self-modifying AI checking on whether a self-modifying sub-AI is behaving correctly isn't much different from a safety perspective than a human checking whether the self modifying AI is behaving correctly.