SimonF comments on Debunking Fallacies in the Theory of AI Motivation - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (343)
Excuse me, but you are really failing to clarify the issue. The basic UFAI doomsday scenario is: the AI has vast powers of learning and inference with respect to its world-model, but has its utility function (value system) hardcoded. Since the hardcoded utility function does not specify a naturalization of morality, or CEV, or whatever, the UFAI proceeds to tile the universe in whatever it happens to like (which are things we people don't like), precisely because it has no motivation to "fix" its hardcoded utility function.
A similar problem would occur if, for some bizarre-ass reason, you monkey-patched your AI to use hardcoded machine arithmetic on its integers instead of learning the concept of integers from data via its, you know, intelligence, and the hardcoded machine math had a bug. It would get arithmetic problems wrong! And it would never realize it was getting them wrong, because every time it tried to check its own calculations, your monkey-patch would cut in and use the buggy machine arithmetic again.
The lesson is: do not hard-code important functionality into your AGI without proving it correct. In the case of a utility/value function, the obvious research path is to find a way to characterize finding out the human operators' desires as an inference problem, thus ensuring that the AI cares about learning correctly from the humans and then implementing what it learned rather than anything hard-coded. Moving moral learning into inference also helps minimize the amount of code we have to prove correct, since it simply isn't AI without correct, functioning learning and inference abilities.
Also, little you've written about CLAI or Swarm Connectionist AI corresponds well to what I've seen of real-world cognitive science, theoretical neuroscience, or machine learning research, so I can't see how either of those blatantly straw-man designs are going to turn into AGI. Please go read some actual scientific material rather than assuming that The Metamorphosis of Prime Intellect is up-to-date with the current literature ;-).
The paper had nothing to do with what you talked about in your opening paragraph, and your comment:
... was extremely rude.
I build AI systems, and I have been working in the field (and reading the literature) since the early 1980s.
Even so, I would be happy to answer questions if you could read the paper carefully enough to see that it was not about the topic you thought it was about.
What? Your post starts with:
Eli's opening paragraph explains the "basic UFAI doomsday scenario". How is this not what you talked about?
The paper's goal is not to discuss "basic UFAI doomsday scenarios" in the general sense, but to discuss the particular case where the AI goes all pear-shaped EVEN IF it is programmed to be friendly to humans.
That last part (even if it is programmed to be friendly to humans) is the critical qualifier that narrows down the discussion to those particular doomsday scenarios in which the AI does claim to be trying to be friendly to humans - it claims to be maximizing human happiness - but in spite of that it does something insanely wicked.
So, Eli says:
... and this clearly says that the type of AI he has in mind is one that is not even trying to be friendly. Rather, he talks about how its
And then he adds that
... which has nothing to do with the cases that the entire paper is about, namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend.
If you read the paper all of this is obvious pretty quickly, but perhaps if you only skim-read a few paragraphs you might get the wrong impression. I suspect that is what happened.
If the AI knows what friendly is or what mean means, than your conclusion is trivially true. The problem is programming those in - that's what FAI is all about.
I still agree with Eli and think you're "really failing to clarify the issue", and claiming that xyz is not the issue does not resolve anything. Disengaging.