It seems to me that while the terminal values of morality per individuals might be fixed, and per species might be relatively invariant, the things to do to get the most 'points' (utility?) might well seem as though the supposedly friendly AI was behaving in a pretty evil manner to us. I wonder if, whether the friendly ai project succeeds or not, how soon if at all we would really know that it had worked. I suppose, though, that's putting it in terms of human levels of intelligence. To us the only solution to overpopulation for instance might seem having a bunch of us die off so the rest, and future generations, can live more comfortably (birth control alone results in problems where you have too many grandparents and not enough caretakers, like the "four-two-one" problem in China). If overpopulation turned out to be a huge problem, a sufficiently advanced AI might be able to mobilize enough infrastructure to house people rapidly enough that their quality of life might not be diminished enough that euthanizing a good portion to preserve the lives and sanity of the remainder might not be the only option. Some high-population-density structures seem like they might actually be enjoyable places to live... Stil, its entirely possible that for non-terminal reasons a perfectly friendly AI might scare the hell out of us, although if it was forced to do that, it would very likely be better than the alternative consequence it was seeking to avoid.
I am reading through the meta-ethics sequence for the first time. One thing I couldn't help but observe in this dialogue which I thought was interesting: Obert: "Duties, and should-ness, seem to have a dimension that goes beyond our whims. If we want different pizza toppings today, we can order a different pizza without guilt; but we cannot choose to make murder a good thing." It seemed odd to me that Subhan didn't mention regret at having made a difficult choice between competing wants, such as wondering whether you should've taken up piano playing instead of plumbing, or whatever, as being possibly something like the kind of negative feelings we get from guilt. We can't always order a different pizza without some sense of loss.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Fascinating discussion and blog. Surely one obvious safeguard to a super-smart AI agent going morally astray or running amok would be to inseparably associate with it a dumber "confessor" AI agent which, while lacking its prowess and flexibility, would have at least the run-of-the-mill intelligence to detect when a proposal might conflict with acceptable human moral standards.
I called it a confessor, by analogy with priests privy to the sins and wicked thoughts of even powerful people. But loads of analogies come to mind, for example an eight stone jockey controlling a half ton race horse, faster than the rider, or a high-resistance loop off a million volt power line to which a small instrument can be rigged to indicate the current flowing in the main line.
You could even have a cascade of agents, each somewhat dumber and less flexible than the next, but all required to justify their decisions down the line prior to action, and the first which failed to agree on a plan (by either not understanding it or concluding it was immoral) would flag up a warning to human observers.
One thing I kind of like about this idea is the 'confessor' could be faster than the 'horse' simply by being dumber (and taking less code to run).