Yet more "stupid" questions

NancyLebovitz

It's less an issue with value drift* -- which does need to be solved for both goals and constraints -- and more about the complexity of the system.

A well-designed goal hierarchy has an upper limit of complexity. Even if the full definition of human terminal values is too complicated to fit in a single human head, it can at least be extrapolated from things that fit within multiple human brains.

Even the best set of constraint heirachies do not share that benefit. Constraint systems in the real world are based around the complexity of our moral and ethical systems as contrasted with reality, and thus the cases can expand (literally) astronomically in relation to the total number of variations in the physical environment. Worse, these cases expand in the future and branch correspondingly -- the classical example, as in The Metamorphisis of Prime Intellect or Friendship is Optimal is an AI built by someone that does not recognize some or all non-human life. A constraint-based AGI built under the average stated legal rules of the 1950s would think nothing about tweaking every person's sexual orientation into heterosexuality, because the lack of such a constraint was obvious at that time and the goal system might well be built with such purposes as an incidental part of the goal, and you don't need to explore the underlying ethical assumptions to code or not code that constraint.

Worse, a sufficiently powerful self-optimizer will expand into situations outside of environments the human brain could guess, or could possibly fit into the modern human head : does "A robot may not injure a human being or, through inaction, allow a human being to come to harm" prohibit or allow Zygraxis-based treatment? You or I -- or anyone else with less than 10^18 working memory -- can't even imagine what that is, but it's a heck of an ethical problem in our nondescript spacefuture! There's a reason Asimov's Three Laws stories tended to be about the constraints failing or acting unpredictably.

You also run into similar problems as in AI-Boxing : if a superhuman intellect would value something that directly conflicts with our ethical systems, it's very hard to be smarter than it when making rules.

The Hidden Complexity of Wishes is a pretty good summary of things.

There may still be some useful situations for constraints in FAI theory -- see the Ethical Injunctions sequence -- but they don't really make things safe in a non-FAI-complete setting.

- Although some problems with value drift are related to the complexity of the system: you're more likely to notice drift in one variable out of fifty than one variable in ten thousand. I don't think unit tests are a good solution to Lob's problem, though.

EDIT: You can limit the complexity of constraints by making them very broad, but then you end up with a genie that is either not very powerful or not very intelligent, or dangerous. See Problem 6 in Dreams of Friendliness

gattsuru13y20

Lumifer13y00

A well-designed goal hierarchy has an upper limit of complexity.

Why is that (other than the trivial "well-designed" == "upper limit of complexity")?

Even the best set of constraint heirachies do not share that benefit.

I don't understand this. Any given set of constraint hierarchies is given, it doesn't have a limit. Are you saying that if you want to construct a constraint set to satisfy some arbitrary criteria you can't guarantee an upper complexity limit? But that seems to be true for goals as well. We have to be careful about u... (read more)