Perfectly Friendly AI

Desrtopa

For the purposes of discussion on this site, a Friendly AI is assumed to be one that shares our terminal values. It's a safe genie that doesn't need to be told what to do, but anticipates how to best serve the interests of its creators. Since our terminal values are a function of our evolutionary history, it seems reasonable to assume that an FAI created by one intelligent species would not necessarily be friendly to other intelligent species, and that being subsumed by another species' FAI would be fairly catastrophic.

Except.... doesn't that seem kind of bad? Supposing I were able to create a strong AI, and it created a sound fun-theoretic utopia for human beings, but then proceeded to expand and subsume extraterrestrial intelligences, and subject them to something they considered a fate worse than death, I would have to regard that as a major failing of my design. My utility function assigns value to the desires of beings whose values conflict with my own. I can't allow other values to supersede mine, but absent other considerations, I have to assign negative utility in my own function for creating negative utility in the functions of other existing beings. I'm skeptical that an AI that would impose catastrophe on other thinking beings is really maximizing my utility.

It seems to me that to truly maximize my utility, an AI would need to have consideration for the utility of other beings. Secondary consideration, perhaps, but it could not maximize my utility simply by treating them as raw material with which to tile the universe with my utopian civilization.

Perhaps my utility function gives more value than most to beings that don't share my values (full disclosure, I prefer the "false" ending of Three Worlds Collide, although I don't consider it ideal.) However, if an AI imposes truly catastrophic fates on other intelligent beings, my own utility function takes such a hit that I cannot consider it friendly. A true Friendly AI would need to be at least passably friendly to other intelligences to satisfy me.

I don't know if I've finally come to terms with Eliezer's understanding of how hard Friendly AI is, or made it much, much harder, but it gives me a somewhat humbling perspective of the true scope of the problem.

Inspired by Don't Plan For the Future.

Note: I get super redundant after like the first reply, so watch out for that. I'm not trying to be an asshole or anything; I'm just attempting to respond to your main point from every possible angle.

For the purposes of discussion on this site, a Friendly AI is assumed to be one that shares our terminal values.

What's a "terminal value"?

My utility function assigns value to the desires of beings whose values conflict with my own.

Even for somebody trying to kill you for fun?

I can't allow other values to supersede mine, but absent other considerations, I have to assign negative utility in my own function for creating negative utility in the functions of other existing beings.

What exactly would those "other considerations" be?

I have to assign negative utility in my own function for creating negative utility in the functions of other existing beings.

Would you be comfortable being a part of putting somebody in jail for murdering your best friend (whoever that is)?

I'm skeptical that an AI that would impose catastrophe on other thinking beings is really maximizing my utility.

What if somebody were to build an AI for hunting down and incarcerating murderers?

Would that "maximize your utility", or would you be uncomfortable with the fact that it would be "imposing catastrophe" on beings "whose desires conflict with [your] own"?

It seems to me that to truly maximize my utility, an AI would need to have consideration for the utility of other beings.

What if the "terminal values" (assuming that I know what you mean by that) of those beings made killing you (for laughs!) a great way to "maximize their utility"?

Perhaps my utility function gives more value than most to beings that don't share my values

But does that extraordinary consideration stretch to the people bent on killing other people for fun?

However, if an AI imposes truly catastrophic fates on other intelligent beings, my own utility function takes such a hit that I cannot consider it friendly.

Would your utility function take that hit if an AI saved your best friend from one of those kind of people (the ones who like to kill other people for laughs)?

Roughly, a terminal value is a thing you value for its own sake.

This is contrasted with instrumental values, which are things you value only because they provide a path to terminal values.

For example: money, on this view, is something we value only instrumentally... having large piles of money with no way to spend it isn't actually what anyone wants.

Caveat: I should clarify that I am not sure terminal values actually exist, personally.

0Desrtopa15y

Wikipedia has an article on it. [...] Things like me getting killed in the course of satisfying their utility functions, as you mentioned above, would be a big one. [...] I support a system where we precommit to actions such as imprisoning people who commit crimes to prevent them from committing crimes in the first place. My utlity function doesn't get positive value out of retribution against them. If an AI that hunts down and incarcerates murderers is better at preventing people from murdering in the first place, I would be in favor of it, assuming no unforseen side effects.

10

Perfectly Friendly AI

10

10

10

Perfectly Friendly AI

10

10