eli_sennesh comments on Debunking Fallacies in the Theory of AI Motivation - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (343)
Excuse me, but you are really failing to clarify the issue. The basic UFAI doomsday scenario is: the AI has vast powers of learning and inference with respect to its world-model, but has its utility function (value system) hardcoded. Since the hardcoded utility function does not specify a naturalization of morality, or CEV, or whatever, the UFAI proceeds to tile the universe in whatever it happens to like (which are things we people don't like), precisely because it has no motivation to "fix" its hardcoded utility function.
A similar problem would occur if, for some bizarre-ass reason, you monkey-patched your AI to use hardcoded machine arithmetic on its integers instead of learning the concept of integers from data via its, you know, intelligence, and the hardcoded machine math had a bug. It would get arithmetic problems wrong! And it would never realize it was getting them wrong, because every time it tried to check its own calculations, your monkey-patch would cut in and use the buggy machine arithmetic again.
The lesson is: do not hard-code important functionality into your AGI without proving it correct. In the case of a utility/value function, the obvious research path is to find a way to characterize finding out the human operators' desires as an inference problem, thus ensuring that the AI cares about learning correctly from the humans and then implementing what it learned rather than anything hard-coded. Moving moral learning into inference also helps minimize the amount of code we have to prove correct, since it simply isn't AI without correct, functioning learning and inference abilities.
Also, little you've written about CLAI or Swarm Connectionist AI corresponds well to what I've seen of real-world cognitive science, theoretical neuroscience, or machine learning research, so I can't see how either of those blatantly straw-man designs are going to turn into AGI. Please go read some actual scientific material rather than assuming that The Metamorphosis of Prime Intellect is up-to-date with the current literature ;-).
The content of your post was pretty good from my limited perspective, but this tone is not warranted.
Perhaps not, but I don't understand why "AI" practitioners insist on being almost as bad as philosophers for butting in and trying to explain to reality that it needs to get into their models and stay there, rather than trying to understand existing phenomena as a prelude to a general theory of cognition.
The paper had nothing to do with what you talked about in your opening paragraph, and your comment:
... was extremely rude.
I build AI systems, and I have been working in the field (and reading the literature) since the early 1980s.
Even so, I would be happy to answer questions if you could read the paper carefully enough to see that it was not about the topic you thought it was about.
What? Your post starts with:
Eli's opening paragraph explains the "basic UFAI doomsday scenario". How is this not what you talked about?
The paper's goal is not to discuss "basic UFAI doomsday scenarios" in the general sense, but to discuss the particular case where the AI goes all pear-shaped EVEN IF it is programmed to be friendly to humans.
That last part (even if it is programmed to be friendly to humans) is the critical qualifier that narrows down the discussion to those particular doomsday scenarios in which the AI does claim to be trying to be friendly to humans - it claims to be maximizing human happiness - but in spite of that it does something insanely wicked.
So, Eli says:
... and this clearly says that the type of AI he has in mind is one that is not even trying to be friendly. Rather, he talks about how its
And then he adds that
... which has nothing to do with the cases that the entire paper is about, namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend.
If you read the paper all of this is obvious pretty quickly, but perhaps if you only skim-read a few paragraphs you might get the wrong impression. I suspect that is what happened.
If the AI knows what friendly is or what mean means, than your conclusion is trivially true. The problem is programming those in - that's what FAI is all about.
I still agree with Eli and think you're "really failing to clarify the issue", and claiming that xyz is not the issue does not resolve anything. Disengaging.
Yes, it was rude.
Except that the paper was about more-or-less exactly what I said in that paragraph. But the whole lesson is: do not hard-code things into AGI systems. Luckily, we learn this lesson everywhere: symbolic, first-order logic-based AI failed miserably, failed not only to generate a superintelligent ethicist but failed, in fact, to detect which pictures are cat pictures or perform commonsense inference.
Ok, and how many of those possessed anything like human-level cognitive abilities? How many were intended to, but failed? How many were designed on a solid basis in statistical learning?
No, as I just explained to SimonF, below, that is not what it is about.
I will repeat what I said:
The paper's goal is not to discuss "basic UFAI doomsday scenarios" in the general sense, but to discuss the particular case where the AI goes all pear-shaped EVEN IF it is programmed to be friendly to humans.
That last part (even if it is programmed to be friendly to humans) is the critical qualifier that narrows down the discussion to those particular doomsday scenarios in which the AI does claim to be trying to be friendly to humans - it claims to be maximizing human happiness - but in spite of that it does something insanely wicked.
So, you said:
... and this clearly says that the type of AI you have in mind is one that is not even trying to be friendly. Rather, you talk about how its
And then you add that
... which has nothing to do with the cases that the entire paper is about, namely the cases where the AI is trying really hard to be friendly, but doing it in a way that we did not intend.
There is very little distinction, from the point of view of actual behaviors, between a supposedly-Friendly-but-actually-not AI, and a regular UFAI. Well, maybe the former will wait a bit longer before its pathological behavior shows up. Maybe. I really don't want to be the sorry bastard who tries that experiment: it would just be downright embarrassing.
But of course, the simplest way to bypass this is precisely to be able to, as previously mentioned in my comment and by nearly all authors on the issue, specify the utility function as the outcome of an inference problem, thus ensuring that additional interaction with humans causes the AI to update its utility function and become Friendlier with time.
Causal inference that allows for deliberate conditioning of distributions on complex, counterfactual scenarios should actually help with this. Causal reasoning does dissolve into counterfactual reasoning, after all, so rational action on evaluative criteria can be considered a kind of push-and-pull force acting on an agent's trajectory through the space of possible histories: undesirable counterfactuals push the agent's actions away (ie: push the agent to prevent their becoming real), while desirable counterfactuals pull the agent's actions towards themselves (ie: the agent takes actions to achieve those events as goals) :-p.
What does that mean? That any AI will necessarily have a hardcoded, mathematical UF, .or that MIRIs UFAI scenario only applies to certain AI architectures? If the latter, then doing things differently is a reasonable response. Alternatives could involve corrigibility, .or expressing goals in natural language. Talking about alternatives isnt irrelevance,in the absence of a proof that MIRIs favoured architecture doesn't subsume everything.
It's entirely possible to build a causal learning and inference engine that does not output any kind of actions at all. But if you have made it output actions, then the cheapest, easiest way to describe which actions to output is hard-coding (ie: writing program code that computes actions from models without performing an additional stage of data-based inference). Since that cheap-and-easy, quick-and-dirty design falls within the behavior of a hardcoded utility function, and since that design is more-or-less what AI practitioners usually talk about, we tend to focus the doomsaying on that design.
There are problems with every design except for the right design, when you are talking about an agent you expect to become more powerful than yourself.
How likely is that cheap and easy architecture to be used in an AI of mire than human intelligence?
Well, people usually build the cheapiest and easiest architecture of anything they can, at first, so very likely.
And remember, "higher than human intelligence" is not some superpower that gets deliberately designed into the AI. The AI is designed to be as intelligent as its computational resources allow for: to compress data well, to perform learning and inference quickly in its models, and to integrate different domains of features and models (again: for compression and generalization). It just gets "higher than human" when it starts integrating feature data into a broader, deeper hierarchy of models faster and with better compression than a human can.
It's likely to be used, but is it likely to both be used and achieve almost accidental higher intelligence.?
Yes. "Higher than human intelligence" does not require that the AI take particular action. It just requires that it come up with good compression algorithms and integrate a lot of data.
Your not really saying why it's likely.
Because "intelligence", in terms like IQ that make sense to a human being, is not a property of the algorithm, it's (as far as my investigations can tell) a function of:
So basically, if you just keep giving your AGI more CPU power and storage space, I do think it will cross over into something dangerously like superintelligence, which I think really just reduces to:
There is no gap-in-kind between your reasoning abilities and those of a dangerously superintelligent AGI. It just has a lot more resources for doing the same kinds of stuff.
An easy analogy for beginners shows up the first time you read about sampling-based computational Bayesian statistics: the accuracy of the probabilities inferred depends directly on the sample size. Since additional computational power can always be put towards more samples on the margin, you can always get your inferred estimates marginally closer to the real probabilities just by adding compute time.
By adding exponentially more time.
Computational complexity can't simply be waived away by saying "add more time/memory".
Hold on, hold on. There are at least two samples involved.
Sample 1 is your original data sampled from reality. Its size is fixed -- additional computational power will NOT get you more samples from reality.
Sample 2 is an intermediate step in "computational Bayesian statistics" (e.g MCMC). Its size is arbitrary and yes, you can always increase it by throwing more computational power at the problem.
However by increasing the size of sample 2 you do NOT get "marginally closer to the real probabilities", for that you need to increase the size of sample 1. Adding compute time gets you marginally closer only to the asymptotic estimate which in simple cases you can even calculate analytically.
That's two lessons. Not hardcoding iat all is under explored round here.
You have to hardcode something, don't you?
I meant not hardcoding values or ethics.
Well, you'd have to hardcode at least a learning algorithm for values if you expect to have any real chance that the AI behaves like a useful agent, and that falls within the category of important functionalities. But then I guess you'll agree with that.
Don't feed the troll. "Not hardcoding values or ethics" is the idea behind CEV, which seems frequently "explored round here." Though I admit I do see some bizarre misunderstandings.