nshepperd comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: XiXiDu 05 September 2013 10:58:05AM -1 points [-]

To be better able to respond to your comment, please let me know in what way you disagree with the following comparison between narrow AI and general AI:

Narrow artificial intelligence will be denoted NAI and general artificial intelligence GAI.

(1) Is it in principle capable of behaving in accordance with human intention to a sufficient degree?

NAI: True

GAI: True

(2) Under what circumstances does it fail to behave in accordance with human intention?

NAI: If it is broken, where broken stands for a wide range of failure modes such as incorrectly managing memory allocations.

GAI: In all cases in which it is not mathematically proven to be tasked with the protection of, and equipped with, a perfect encoding of all human values or a safe way to obtain such an encoding.

(3) What happens when it fails to behave in accordance with human intention?

NAI: It crashes, freezes or halts. It generally fails in a way that is harmful to its own functioning. If for example an autonomous car fails at driving autonomously it usually means that it will either go into safe-mode and halt or crash.

GAI: It works perfectly well. Superhumanly well. All its intended capabilities are intact except that it completely fails at working as intended in such a way as to destroy all human value in the universe. It will be able to improve itself and capable of obtaining a perfect encoding of human values. It will use those intended capabilities in order to deceive and overpower humans rather than doing what it was intended to do.

(4) What happens if it is bound to use a limited amount of resources, use a limited amount of space or run for a limited amount of time?

NAI: It will only ever do what it was programmed to do. As long as there is no fatal flaw, harming its general functionality, it will work within the defined boundaries as intended.

GAI: It will never do what it was programmed to do and always remove or bypass its intended limitations in order to pursue unintended actions such as taking over the universe.


Please let me also know where you disagree with the following points:

(1) The abilities of systems are part of human preferences as humans intend to give systems certain capabilities and, as a prerequisite to build such systems, have to succeed at implementing their intentions.

(2) Error detection and prevention is such a capability.

(3) Something that is not better than humans at preventing errors is no existential risk.

(4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.

(5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error.

Comment author: nshepperd 05 September 2013 12:31:27PM 6 points [-]

GAI: It will never do what it was programmed to do and always remove or bypass its intended limitations in order to pursue unintended actions such as taking over the universe.

GAI is a program. It always does what it's programmed to do. That's the problem—a program that was written incorrectly will generally never do what it was intended to do.

FWIW, I find your statements 3,4,5 also highly objectionable, on the grounds that you are lumping a large class of things under the blank label "errors". Is an "error" doing something that humans don't want? Is it doing something the agent doesn't want? Is it accidentally mistyping a letter in a program, causing a syntax error, or thinking about something heuristically and coming to the wrong conclusion, then making carefully planned decision based on that mistake? Automatic proof systems don't save you if you what you think you need to prove isn't actually what you need to prove.