scav comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
To be better able to respond to your comment, please let me know in what way you disagree with the following comparison between narrow AI and general AI:
Narrow artificial intelligence will be denoted NAI and general artificial intelligence GAI.
(1) Is it in principle capable of behaving in accordance with human intention to a sufficient degree?
NAI: True
GAI: True
(2) Under what circumstances does it fail to behave in accordance with human intention?
NAI: If it is broken, where broken stands for a wide range of failure modes such as incorrectly managing memory allocations.
GAI: In all cases in which it is not mathematically proven to be tasked with the protection of, and equipped with, a perfect encoding of all human values or a safe way to obtain such an encoding.
(3) What happens when it fails to behave in accordance with human intention?
NAI: It crashes, freezes or halts. It generally fails in a way that is harmful to its own functioning. If for example an autonomous car fails at driving autonomously it usually means that it will either go into safe-mode and halt or crash.
GAI: It works perfectly well. Superhumanly well. All its intended capabilities are intact except that it completely fails at working as intended in such a way as to destroy all human value in the universe. It will be able to improve itself and capable of obtaining a perfect encoding of human values. It will use those intended capabilities in order to deceive and overpower humans rather than doing what it was intended to do.
(4) What happens if it is bound to use a limited amount of resources, use a limited amount of space or run for a limited amount of time?
NAI: It will only ever do what it was programmed to do. As long as there is no fatal flaw, harming its general functionality, it will work within the defined boundaries as intended.
GAI: It will never do what it was programmed to do and always remove or bypass its intended limitations in order to pursue unintended actions such as taking over the universe.
Please let me also know where you disagree with the following points:
(1) The abilities of systems are part of human preferences as humans intend to give systems certain capabilities and, as a prerequisite to build such systems, have to succeed at implementing their intentions.
(2) Error detection and prevention is such a capability.
(3) Something that is not better than humans at preventing errors is no existential risk.
(4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.
(5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error.
First list:
1) Poorly defined terms "human intention" and "sufficient".
2) Possibly under any circumstances whatsoever, if it's anything like other non-trivial software, which always has some bugs.
3) Anything from "you may not notice" to "catastrophic failure resulting in deaths". Claim that failure of software to work as humans intend will "generally fail in a way that is harmful to it's own functioning" is unsupported. E.g. a spreadsheet works fine if the floating point math is off in the 20th bit of the mantissa. The answers will be wrong, but there is nothing about that that the spreadsheet could be expected to care about,
4) Not necessarily. GAI may continue to try to do what it was programmed to do, and only unintentionally destroy a small city in the process :)
Second list:
1) Wrong. The abilities of sufficiently complex systems are a huge space of events humans haven't thought about yet, and so do not yet have preferences about. There is no way to know what their preferences would or should be for many many outcomes.
2) Error as failure to perform the requested action may take precedence over error as failure to anticipate hypothetical objections from some humans to something they hadn't expected. For one thing, it is more clearly defined. We already know human-level intelligences act this way.
3) Asteroids and supervolcanoes are not better than humans at preventing errors. It is perfectly possible for something stupid to be able to kill you. Therefore something with greater cognitive and material resources than you, but still with the capacity to make mistakes can certainly kill you. For example, a government.
4) It is already possible for a very fallible human to make something that is better than humans at detecting certain kinds of errors.
5) No. Unless by dramatic you mean "impossibly perfect, magical and universal".