TheAncientGeek comments on Magical Categories - Less Wrong

24 Post author: Eliezer_Yudkowsky 24 August 2008 07:51PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (89)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: RobbBB 16 January 2014 09:05:03PM *  1 point [-]

Assuming morality is lots of highly localised, different things...which I don't , particularly.

The problem of FAI is the problem of figuring out all of humanity's deepest concerns and preferences, not just the problem of figuring out the 'moral' ones (whichever those are). E.g., we want a superintelligence to not make life boring for everyone forever, even if 'don't bore people' isn't a moral imperative.

Regardless, I don't see how the moral subset of human concerns could be simplified without sacrificing most human intuitions about what's right and wrong. Human intuitions as they stand aren't even consistent, so I don't understand how you can think the problem of making them consistent and actionable is going to be a simple one.

if it is not, then you can figure it out anywhere,

Someday, perhaps. With enough time and effort invested. Still, again, we would expect a lot more human-intelligence-level aliens (even if those aliens knew a lot about human behavior) to be good at building better AIs than to be good at formalizing human value. For the same reason, we should expect a lot more possible AIs we could build to be good at building better AIs than to be good at formalizing human value.

If it is,then the problem the aliens have is not that morality is imponderable

I don't know what you mean by 'imponderable'. Morality isn't ineffable; it's just way too complicated for us to figure out. We know how things are on Earth; we've been gathering data and theorizing about morality for centuries. And our progress in formalizing morality has been minimal.

An averagely intelligent AI with an average grasp of morlaity would not be more of a threat than an average human.

An AI that's just a copy of a human running on transistors is much more powerful than a human, because it can think and act much faster.

A smart AI, would, all other things being equal, be better at figuring out moralitry.

It would also be better at figuring out how many atoms are in my fingernail, but that doesn't mean it will ever get an exact count. The question is how rough an approximation of human value can we allow before all value is lost; this is the 'fragility of values' problem. It's not enough for an AGI to do better than us at FAI; it has to be smart enough to solve the problem to a high level of confidence and precision.

But why should moral concepts be som much more difficult than others?

First, because they're anthropocentric; 'iron' can be defined simply because it's a common pattern in Nature, not a rare high-level product of a highly contingent and complex evolutionary history. Second, because they're very inclusive; 'what humans care about' or 'what humans think is Right' is inclusive of many different human emotions, intuitions, cultural conventions, and historical accidents.

But the main point is just that human value is difficult, not that it's the most difficult thing we could do. If other tasks are also difficult, that doesn't necessarily make FAI easier.

An AI smart enought to talk its way out of a box would be able to understand the implicit complexity: an AI too dumb to understand implicit complexity would be boxable. Where is the problem?

You're forgetting the 'seed is not the superintelligence' lesson from The genie knows, but doesn't care. If you haven't read that article, go do so. The seed AI is dumb enough to be boxable, but also too dumb to plausibly solve the entire FAI problem itself. The superintelligent AI is smart enough to solve FAI, but also too smart to be safely boxed; and it doesn't help us that an unFriendly superintelligent AI has solved FAI, if by that point it's too powerful for us to control. You can't safely pass the buck to a superintelligence to tell us how to build a superintelligence safe enough to pass bucks to.

Things are not inherently dagerous just because they are unpredictable. If you have some independent reason fo thinking something might turn dangerous, then it becomes desirable to predict it.

Yes. The five theses give us reason to expect superintelligent AI to be dangerous by default. Adding more unpredictability to a system that already seems dangerous will generally make it more dangerous.

they are not assumed to develop mysterious blind spots about falconry or mining engineering, Why assume they will develop a blind spot about morality?

'The genie knows, but doesn't care' means that the genie (i.e., superintelligence) knows how to do human morality (or could easily figure it out, if it felt like trying), but hasn't been built to care about human morality. Knowing how to behave the way humans want you to is not sufficient for actually behaving that way; Eliezer makes that point well in No Universally Compelling Arguments.

The worry isn't that the superintelligence will be dumb about morality; it's that it will be indifferent to morality, and that by the time it exists it will be too late to safely change that indifference. The seed AI (which is not a superintelligence, but is smart enough to set off a chain of self-modifications that lead to a superintelligence) is dumb about morality (approximately as dumb as humans are, if not dumber), and is also probably not a particularly amazing falconer or miner. It only needs to be a competent programmer, to qualify as a seed AI.

The average person manages to solve the problem of being moral themselves, in a good-enough way.

Good enough for going to the grocery store without knifing anyone. Probably not good enough for safely ruling the world. With greater power comes a greater need for moral insight, and a greater risk should that insight be absent.

Why isn't havign a formalisation of morality a prolem with humans?

It is a problem, and it leads to a huge amount of human suffering. It doesn't mean we get everything wrong, but we do make moral errors on a routine basis; the consequences are mostly non-catastrophic because we're slow, weak, and have adopted some 'good-enough' heuristics for bounded circumstances.

We know how humans incremently improve as moral reasoners: it's called the Kohlberg hierarchy.

Just about every contemporary moral psychologist I've read or talked to seems to think that Kohlberg's overall model is false. (Though some may think it's a useful toy model, and it certainly was hugely influential in its day.) Haidt's The Emotional Dog and Its Rational Tail gets cited a lot in this context.

We do have morality tests. Fail them and you get pilloried in the media or sent to jail.

That's certainly not good enough. Build a superintelligence that optimizes for 'following the letter of the law' and you don't get a superintelligence that cares about humans' deepest values. The law itself has enough inexactness and arbitrariness that it causes massive needless human suffering on a routine basis, though it's another one of those 'good-enough' measures we keep in place to stave off even worse descents into darkness.

If it works like arithmetic, that is if it is an expansion of some basic principles

Human values are an evolutionary hack resulting from adaptations to billions of different selective pressures over billions of years, innumerable side-effects of those adaptations, genetic drift, etc. Arithmetic can be formalized in a few sentences. Why think that humanity's deepest preferences are anything like that simple? Our priors should be very low for 'human value is simple' just given the etiology of human value, and our failure to converge on any simple predictive or normative theory thus far seems to only confirm this.