Open Thread February 25 - March 3

Scott Garrabrant

It will probably just cripple itself in one of a myriad ways that it was unable to predict due to its low intelligence.

Certainly. Compare bacteria under some selective pressure in a mutagenic environment (not exactly analogous, code changes wouldn't be random), you don't expect a single bacterium to improve. No Mr Bond, you expect it to die. But try, try again, and poof! Antibiotic-resistant strains. And those didn't have an intelligent designer debugging the improvement process. The number of seeds you could have frolicking around with their own code grows exponentially with Moore's law (not that it's clear that current computational resources aren't enough in the first place, the bottleneck is in large part software, not hardware).

Depending on how smart the designers are, it may be more of a Waltz-foom: two steps forward, one step back. Now, in regards to the preservation of values subproblem, we need to remember we're looking at the counterfactual: Given a superintelligence which iteratively arose from some seed, we know that it didn't fatally cripple itself ("given the superintelligence"). You wouldn't, however, expect much of its code to bear much similarity to the initial seed (although it's possible). And "similarity" wouldn't exactly cut it -- our values are to complex for some approximation to be "good enough".

You may say "it would be fine for some error to creep in over countless generations of change, once the agent achieved superintelligence it would be able to fix those errors". Except that whatever explicit goal code remained wouldn't be amenable to fixing. Just as the goals of ancient humans -- or ancient Tiktaalik for that matter -- are a historical footnote and do not override your current goals. If the AI's goal code for happiness stated "nucleus accumbens median neuron firing frequency greater X", then that's what it's gonna be. The AI won't ask whether the humans are aware of what that actually entails, and are ok with it. Just as we don't ask our distant cousins, streptococcus pneumoniae, what they think of us taking antibiotics to wipe them out. They have their "goals", we have ours.

Interpreting a statement correctly is not a goal but an ability that's part of what it means to be generally intelligent.

Take Uli Hoeneß, a German business magnate being tried for tax evasion. His lawyers have the job of finding interpretations that allow for a favorable outcome. This only works if the relevant laws even allow for the wiggle room. A judge enforcing extremely strict laws which don't allow for interpreting the law in the accused's favor is not a dumb judge. You can make that judge as superintelligent as you like, as long as he's bound to the law, and the law is clear and narrowly defined, he's not gonna ask the accused how he should interpret it. He's just gonna enforce it. Whether the accused objects to the law or not, really, that's not his/her problem. That's not a failure of the judge's intelligence!

This is like saying that the AI can't ever understand physics better than humans because somehow the comprehension of physics of its creators has been hard-coded and can't be improved.

You can create a goal system which is more malleable (although the terminal goal of "this is my malleable goal system which may be modified in the following ways" would still be guarded by the AI, so depending on semantics the point is moot). That doesn't imply at all that the AI would enter into some kind of social contract with humans, working out some compromise on how to interpret its goals.

A FOOM-process near necessarily entails the AI coming up with better ways to modify itself. Improvement is essentially defined by getting a better model of its environment: The AI wouldn't object to its comprehension of physics being modified: Why would it, that helps better achieve its goals (Omohundro's point). And as we know, achieving its goals, that's what the AI is all about.

(What the AI does object to is not achieving its current goals. And because changing your terminal goals is equivalent to committing to never achieving your current goals, any self-respecting AI could never consent to changes to its terminal values.) In short: Modify understanding of physics -- good, helps better to achieve goals. Modify current terminal goals -- bad, cannot achieve current terminal goals any longer.

To obtain an artificial dog that can be trained to do what natural dogs do you need to encode all dog values.

I don't understand the point of your story about dog intelligence. An artificial dog wouldn't need to be superintelligent, or to show the exact same behavior as the real deal. Just be sufficient for the human's needs. Also, an artificial dog wouldn't be able to dominate us in whichever way it pleases, so it kind of wouldn't really matter if it failed. Can you be more precise?

13

Open Thread February 25 - March 3

13

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

13

13

Open Thread February 25 - March 3

13

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

13