thomblake comments on Holden's Objection 1: Friendliness is dangerous - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (428)
EDIT: To edit and simplify my thoughts, in order to get a General Intelligence Algorithm Instance to do anything requires masterful manipulation of parameters with full knowledge of generally how it is going to behave as a result. A level of understanding of psychology of all intelligent (and sub-intelligent) behavior. It is not feasible that someone would accidentally program something that would become an evil mastermind. GIA instances could easily be made to behave in a passive manner even when given affordances and output, kind of like a person that was happy to assist in any way possible because they were generally warm or high or something.
You can define the most important elements of human values for a GIA instance, because most of human values are a direct logical consequence of something that cannot be separated from the GIA... IE if general motivation X accidentally drove intelligence (see: Orthogonality Thesis ) and it also drove positive human values, then positive human values would be unavoidable. It is true that the specifics of body and environment drive some specific human values, but those are just side effects of X in that environment and X in different environments only changes so much and in predictable ways.
You can directly implant knowledge/reasoning into a GIA instance. The easiest way to do this is to train one under very controlled circumstances, and then copy the pattern. This reasoning would then condition the GIA instance's interpretation of future input. However, under conditions which directly disprove the value of that reasoning in obtaining X the GIA instance would un-integrate that pattern and reintegrate a new one. This can be influenced with parameter weights.
I suppose this could be a concern regarding the potential generation of an anger instinct. This HEAVILY depends on all the parameters however, and any outputs given to the GIA instance. Also, robots and computers do not have to eat, and have no associated instincts with killing things in order to do so... Nor do they have reproductive instincts...
When you say "predictable", do you mean in principle or actually predictable?
That is, are you claiming that you can predict what any human values given their environment, and furthermore that the environment can be easily and compactly specified?
Can you give an example?
Mathematically predictable but somewhat intractable without a faster running version of the instance, with the same frequency of input. Or predictable within ranges of some general rule.
Or just generally predictable with the level of understanding afforded to someone capable of making one in the first place, that for instance could describe the cause of just about any human psychological "disorder".