somervta comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: Transfuturist 03 September 2013 09:00:04PM -1 points [-]

What if the AI's utility function is to find the right utility function, being guided along the way? Its goals could be such as learning to understand us, obey us, and predict what we might want/like/approve, moving its object-level goals to what would satisfy humanity? In other words, a probabilistic utility function with great amounts of uncertainty, and great amounts of apprehension to change, or stability.

Regardless of the above questions/statement, I think much of the complexity of human utility comes from complexities of belief.

If we offload complexity of the AI's utility function into very uncertainly defined concepts, and give it an apprehension to do anything but observe given such little data... I don't know, though. This has been something I've been sitting on for a while, lambast me.

As one last thing, I think the best kind of FAI would be a singleton, with a metautility function, or society's utility function. I think one part of Friendliness would be determining a utility function for society, as to how people can interfere with each other in what circumstances, and then build the genie's utility function in the singleton's constraints.

Please critique. If my ideas are as unclear as I think they may be (I'm sick), please mention it.

Comment author: somervta 04 September 2013 01:14:23AM 4 points [-]

What if the AI's utility function is to find the right utility function

Coding your appreciation of 'right' is more difficult than you think. This is, essentially, what CEV is - an attempt at figuring out how an FAI can find the 'right' utility function.

Its goals could be such as learning to understand us, obey us, and predict what we might want/like/approve, moving its object-level goals to what would satisfy humanity? In other words, a probabilistic utility function with great amounts of uncertainty, and great amounts of apprehension to change, or stability.

You're talking about normative uncertainty, which is a slightly different problem than epistemic uncertainty. The easiest way too do this would be to reduce the problem to an epistemic one (these are the characteristics of the correct utility function, now reason probabilistically about which of these candidate functions is it), but that still has the action problem - an agent takes actions based on it's utility function - if it has a weighting over all utility functions, it may act in undesirable ways, particularly if it doesn't quickly converge to a single solution. There are a few other problems I could see with that approach - the original specification of 'correctness' has to be almost Friendliness-Complete; it must be specific enough to pick out a single function (or perhaps many functions, all of which are what we want to want, without being compatible with any undesirable solutions). Also, a seed AI may not be able to follow the specification correctly, a superintelligence is going to have to have some well-specified goal along the lines of "increase your capability without doing anything bad, until you have the ability to solve this problem, and then adopt the solution as your utility function". You may noticed a familiar problem in the bolded part of that (English - remember we have to be able to code all of this) sentence.

Comment author: Transfuturist 04 September 2013 03:43:13AM *  -1 points [-]

What if the AI's utility function is to find the right utility function

Coding your appreciation of 'right' is more difficult than you think.

I mean, instead of coding it, have it be uncertain about what is "right," and to guide itself using human claims. I'm thinking of the equivalent of something in EY's CFAI, but I've forgotten the terminology.

In other words, a meta-utility function. Why can't it weight actions based on what we as a society want/like/approve/consent/condone? A behavioristic learner, with reward/punishment and an intention to preserve the semantic significance of the reward/punishment channel.

if it has a weighting over all utility functions, it may act in undesirable ways, particularly if it doesn't quickly converge to a single solution.

When I said uncertainty, I was also implying inaction. I suppose inaction could be an undesirable way in which to act, but it's better to get it right slowly than to get it wrong very quickly. What I'm describing isn't really a utility function, it's more like a policy, or policy function. Its policy would be volatile, or at least, more volatile than the common understanding LW has of a set-in-stone utility function.

If a utility function really needs to be pinpointed so exactly, surrounded by death and misery on all sides, why are we using a utility function to decide action? There are other approaches. Where did LW's/EY's concept of utility function come from, and why did they assume it was an essential part of AI?

Comment author: gattsuru 04 September 2013 07:05:46PM 4 points [-]

Why can't it weight actions based on what we as a society want/like/approve/consent/condone? A behavioristic learner, with reward/punishment and an intention to preserve the semantic significance of the reward/punishment channel.

Most obviously, it's very easy for a powerful AI to take unexpected control of the reward/punishment channel, and trivial for a superintelligent AGI to do so in Very Bad ways. You've tried to block the basic version of this -- an AGI pressing its own "society liked this" button -- with the phrase 'semantic significance', but that's not really a codable concept. If the AGI isn't allowed to press the button itself, it might build a machine that would do so. If it isn't allowed to do that, it might wirehead a human into doing so. If it isn't allowed /that/, it might put a human near a Paradise Machine and only let them into the box when the button had been pressed. If the AGI's reward is based on the number of favorable news reports, now you have an AGI that's rewarded for manipulating its own media coverage. So on, and so forth.

The sort of semantic significance you're talking about is a pretty big part of Friendliness theory.

The deeper problem is that the things our society wants aren't necessarily Friendly, especially when extrapolated. One of the secondary benefits of Friendliness research is that it requires the examination of our own interests.

Its policy would be volatile, or at least, more volatile than the common understanding LW has of a set-in-stone utility function.

The 'set-in-stone' nature of a utility function is actually a desired benefit, albeit a difficult one to achieve ("Lob's Problem" and the more general issue of value drift). A machine with undirected volatility in its utility function will take random variations in its choices, and there are orders of magnitude more wrong random answers than correct ones on this matter.

If you can direct the drift, that's less of an issue, but then you could just make /that/ direction the utility function.

Where did LW's/EY's concept of utility function come from, and why did they assume it was an essential part of AI?

The basic idea of goal maximization is a fairly common thing when working with evolutionary algorithms (see XKCD for a joking example), because it's such a useful model. While there are other types of possible minds, maximizers of /some/ kind with unbounded or weakly bounded potential are the most relevant to MIRI's concerns because they have the greatest potential for especially useful and especially harmful results.

Comment author: DSimon 04 September 2013 12:53:34PM 1 point [-]

Why can't it weight actions based on what we as a society want/like/approve/consent/condone?

Human society would not do a good job being directly in charge of a naive omnipotent genie. Insert your own nightmare scenario examples here, there are plenty to choose from.

What I'm describing isn't really a utility function, it's more like a policy, or policy function. Its policy would be volatile, or at least, more volatile than the common understanding LW has of a set-in-stone utility function.

What would be in charge of changing the policy?

Comment author: Transfuturist 04 September 2013 04:47:56PM *  -1 points [-]

Why can't it weight actions based on what we as a society w/l/a/c/c?

Human society would not do a good job being directly in charge of a naive omnipotent genie. Insert your own nightmare scenario examples here, there are plenty to choose from.

But that doesn't describe humanity being directly in charge. It only describes a small bit of influence for each person, and while groups would have leverage, that doesn't mean a majority rejecting, say, homosexuality, gets to say what LGB people can and can't do/be.

What I'm describing isn't really a utility function, it's more like a policy, or policy function. Its policy would be volatile, or at least, more volatile than the common understanding LW has of a set-in-stone utility function.

What would be in charge of changing the policy?

The metautility function I described.

What is a society's intent? What should a society's goals be, and how should it relate to the goals of its constituents?

Comment author: Lumifer 04 September 2013 05:23:35PM 6 points [-]

that doesn't mean a majority rejecting, say, homosexuality, gets to say what LGB people can and can't do/be.

I think it means precisely that if the majority feels strongly enough about it.

For a quick example s/homosexuality/pedophilia/

Comment author: Transfuturist 04 September 2013 06:55:27PM *  2 points [-]

Good point. I think I was reluctant to use pedophilia as an example because I'm trying to defend this argument, and claiming it could allow pedophilia is not usually convincing. RAT - 1 for me.

I'll concede that point. But my questions aren't rhetorical, I think. There is no objective morality, and EY seems to be trying to get around that. Concessions must be made.

I'm thinking that the closest thing we could have to CEV is a social contract based on Rawls' veil of ignorance, adjusted with live runoff of supply/demand (i.e. the less people want slavery, the more likely that someone who wants slavery would become a slave, so prospective slaveowners would be less likely to approve of slavery on the grounds that they themselves do not want to be slaves. Meanwhile, people who want to become slaves get what they want as well. By no means is this a rigorous definition or claim.), in a post-scarcity economy, with sharding of some sort (as in CelestAI sharding, where parts of society that contribute negative utility to an individual are effectively invisible to said individual. There was an argument on LW that CEV would be impossible without some elements of separation similar to this).

Comment author: Gurkenglas 04 September 2013 09:25:08PM 2 points [-]

The less people want aristocracy, the more likely that someone who wants aristocracy would become a noble, so prospective nobles would be more like to approve of aristocracy on the grounds that they themselves want to be nobles?

Comment author: Transfuturist 04 September 2013 11:44:40PM -1 points [-]

The less people want aristocracy, the more likely that someone who wants aristocracy would become a peon, so prospective nobles would be less likely to approve of aristocracy on the grounds that they themselves want to be peons.

I have to work this out. You have a good point.