cata comments on An Xtranormal Intelligence Explosion - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (80)
That's awfully convenient.
Not really. An AI that didn't have a specific desire to be friendly to mankind would want to kill us to cut down on unnecessary entropy increases.
As you get closer to the mark, with AGI's that have utility function that roughly resembles what we would want, but is still wrong, the end results are most likely worse than death. Especially since there should be much more near-misses than exact hits. Like, AGI that doesn't want to let you die, regardless of what you go through, and little regard to your other sort of well-being, would be closer to the FAI than paperclip maximizer that would just plain kill you. As you get closer to the core of friendliness, you get all sorts of weird AGI's that want to do something that twistedly resembles something good, but is somehow missing something or is somehow altered so that the end result is not at all what you wanted.
Is this true or is this a useful assumption to protect us from doing something stupid?
Is it true that Friendliness is not an attractor or is it that we cannot count on such a property unless it is absolutely proven to be the case?
My idea there was that if it's not Friendly, then it's not Friendly, ergo it is doing something that you would not want an AI to be doing(if you thought faster and knew more and all that). That's the core of the quote you had there. Random intelligent agent would simply transform us into something of value, so we would most likely die very quickly. However, when you get closer to the Friendliness, Ai is no longer totally indifferent to us, but rather, is maximizing something that could involve living humans. Now, if you take an AI that wants there to be living humans around, but is not known for sure to be Friendly, what could go wrong? My answer, many things, as what humans prefer to be doing is rather complex set of stuff, and even quite little changes could make us really, really unsatisfied with the end result. At least, that's the idea I've gotten from posts here like Value is Fragile.
When you ask if Friendliness is an attractor, do you mean to ask if intelligences near Friendly ones in the design spaces tend to transform into Friendly ones? This seems rather unlikely, as that sort of AI's most likely are capable of preserving their utility function, and the direction of this transformation is not "natural". For these reasons, arriving at the Friendliness is not easy, and thus, I'd say you gotta have some sort of a way to ascertain the Friendliness before you can trust it to be just that.
Is this also true if you replace "mankind" with "ants" or "daffodils"?
Ants and daffodils might, by some definitions, have preferences--but it wouldn't be necessary for a FAI to explicitly consider their preferences, as long as their preferences constitute some part of humanity's CEV, which seems likely: I think an intact Earth ecosystem would be rather nice to retain, if at all possible.
The entropic contribution of ants and daffodils would doubtless make them candidates for early destruction by a UFAI, if such a step even needed to be explicitly taken alongside destroying humanity.
Imagine an AGI with with the opposite utility function of an FAI, it minimizes the Friendly Utility Function, which would involve doing things far worse than killing us. If you are not putting effort into choosing a utility function, building this AGI seems as likely as building an FAI, as well as lots of other possibilities in the space of AGIs whose utility functions refer to humans, some of which would keep us alive, not all in ways we would appreciate.
The reason I would expect an AGI in this space to be somewhat close to Friendly, is: just hitting the space of utility functions that refer to humans is hard, if it happens it is likely because a human deliberately hit it, and this should indicate that the human has the skill and motivation to optimize further within that space to build an actual Friendly AGI.
If you stipulate that the programmer did not make this effort, and hitting the space of AGIs that keep humans alive only occurred in tiny quantum branches, then you have screened of the argument of a skilled FAI developer, and it seems unlikely that the AGI within this space would be Friendly.
You've made a lot of good comments in this thread, but I disagree with this. As likely?
It seems you are assuming that every possible point in AI mind space is equally likely, regardless of history, context, or programmer intent. This is like saying that, if someone writes a routine to sort numbers numerically, it's just as likely to sort them phonetically.
It seems likely to me that this belief, that the probability distribution over AI mindspace is flat, has become popular on LessWrong, not because there is any logic to support it, but because it makes the Scary Idea even scarier.
Yes, my predictions of what will happen when you don't put effort into choosing a utility function are inaccurate in the case where you do put effort into choosing a utility function.
Well, lets suppose someone wants a routine to sort numbers numerically, but doesn't know how to do this, and tries a bunch of stuff without understanding. Conditional on the programmer miraculously achieving some sort of sorting routine, what should we expect about it? Sorting phonetically would add extra complication over sorting numerically, as the information about the names of numbers would have to be embedded within the program, so that would seem less likely. But a routine that sorts numerically ascending is just as likely as a routine that sorts numerically descending, as these routines have a complexity preserving one to one correspondance by interchaning "greater than" with "less than".
And the utility functions I clamed were equally likely before have the same complexity preserving one to one correspondance.
An AI that that had a botched or badly preserved Friendliness, or that was unfriendly but had been initialized with supergoals involving humans, may well have specific, unpleasant, non-extermination plans for humans.
As in, "I have no mouth and I must scream".
Would it? Though we do contribute to entropy, things like, say, stars do so at a much faster pace. Admittedly this is logically distinct from the AI's decision to destroy humanity, but I don't see why it would immediately jump to the conclusion that we should be wiped out when the main sources of entropy are elsewhere.
More to the point, not all unFriendly AIs would necessarily care about entropy.
It's kind of a moot question though since shutting off the sun would also be a very effective means of killing people.
For almost any objective an AI had, it could better accomplish it the more free energy the AI had. The AI would likely go after entropy losses from both stars and people. The AI couldn't afford to wait to kill people until after it had dealt with nearby stars because by then humans would have likely created another AI god.
Assuming that by "AI" you mean something that maximizes a utility function, as opposed to a dumb apocalypse like a grey-goo or energy virus scenario.
I can see how a “dumb apocalypse like a grey-goo or energy virus” would be Artificial, but why would you call it Inteligent?
On this site, unless otherwise specified, AI usually means “at least as smart as a very smart human”.
Yeah, that makes sense. I was going to suggest "smart enough to kill us", but that's a pretty low bar.