Houshalter comments on The Brain as a Universal Learning Machine - Less Wrong

82 Post author: jacob_cannell 24 June 2015 09:45PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (166)

You are viewing a single comment's thread. Show more comments above.

Comment author: Kaj_Sotala 22 June 2015 12:59:06PM 9 points [-]

The danger is not in paperclip maximizers, it is in simple and yet easy to specify utility functions. For example, the basic goal of "maximize knowledge" is probably much easier to specify than a human friendly utility function. Likewise the maximization of future freedom of action proposal from Wissner-Gross is pretty simple. But both probably result in very dangerous agents.

I think Ex Machina illustrated the most likely type of dangerous agent - it isn't a paperclip maximizer. It's more like a sociopath. A ULM with a too-simple initial utility function is likely to end up something like a sociopath.

This made me think. I've noticed that some machine learning types tend to have a tendency to dismiss MIRI's standard "suppose we programmed an AI to build paperclips and it then proceeded to convert the world into paperclips" examples with a reaction like "duh, general AIs are not going to be programmed with goals directly in that way, these guys don't know what they're talking about".

Which is fair on one hand, but also missing the point on the other hand.

It could be valuable to write a paper pointing out that sure, even if forget about that paperclipping example and instead assume a more deep learning-style AI that needs to grow and be given its goals in a more organic manner, most of the standard arguments about AI risk still hold.

Adding that to my todo-list...

Comment author: Houshalter 23 June 2015 07:25:59AM 1 point [-]

I've written about this before. The argument goes something like this.

RL implies self preservation, since dying prevents you from obtaining more reward. And self preservation leads to undesirable behavior.

E.g. making as many copies of yourself as possible for redundancy. Or destroying anything that has the tiniest probability of being a threat. Or trying to store as much mass and energy as possible to last against the heat death of the universe.

Comment author: [deleted] 27 June 2015 12:30:14AM 2 points [-]

Or, you know, just maximizing your reward signal by wiring it that way in hardware. This would reduce your planning gradient to zero, which would suck for gradient-based planning algorithms, but there are also planning algorithms more closely tied to world-states that don't rely on a reward gradient.

Comment author: Houshalter 27 June 2015 08:28:04AM -1 points [-]

Even if the AI wires it's reward signal to +INF, it probably still would consider time, and therefore self preservation.

Comment author: Vaniver 27 June 2015 02:23:13PM 2 points [-]

it probably still would consider time

Is this a mathematical argument, or a verbal argument?

Specifically, what eli_sennesh means by a "planning gradient" is that you compare a plan to alternative plans around it, and switch plans in the direction of more reward. If your reward function returns infinity for any possible plan, then you will be indifferent among all plans, and your utility function will not constrain what actions you take at all, and your behavior is 'unspecified.'

I think you're implicitly assuming that the reward function is housed in some other logic, and so it's not that the AI is infinitely satisfied by every possibility, but that the AI is infinitely satisfied by continuing to exist, and thus seeks to maximize the amount of time that it exists. But if you're going to wirehead, why would you leave this potential source for disappointment around, instead of making the entire reward logic just return "everything is as good as it could possibly be"?

Comment author: Kaj_Sotala 29 June 2015 09:53:15AM 0 points [-]

Here's one mathematical argument for it, based on the assumption that the AI can rewire its reward channel but not the whole reward/planning function: http://www.agroparistech.fr/mmip/maths/laurent_orseau/papers/ring-orseau-AGI-2011-delusion.pdf

We have argued that the reinforcement-learning, goal-seeking and predictionseeking agents all take advantage of the realistic opportunity to modify their inputs right before receiving them. This behavior is undesirable as the agents no longer maximize their utility with respect to the true (inner) environment but instead become mere survival agents, trying only to avoid those dangerous states where their code could be modified by the environment.

Comment author: [deleted] 28 June 2015 06:53:53PM *  0 points [-]

Yes, that's the basic problem with considering the reward signal to be a feature, to be maximized without reference to causal structure, rather than a variable internal to the world-model.

Comment author: [deleted] 28 June 2015 06:56:26PM 0 points [-]

Again: that depends what planning algorithm it uses. Many reinforcement learners use planning algorithms which presume that the reward signal has no causal relationship to the world-model. Once these learners wirehead themselves, they're effectively dead due to the AIXI Anvil-on-Head Problem, because they were programmed to assume that there's no relationship between their physical existence and their reward signal, and they then destroyed the tenuous, data-driven correlation between the two.

Comment author: Houshalter 30 June 2015 06:53:52AM 0 points [-]

I'm having a very hard time modelling how different AI types would act in extreme scenarios like that. I'm surprised there isn't more written about this, because it seems extremely important to whether UFAI is even a threat at all. I would be very relieved if that was the case, but it doesn't seem obvious to me.

Particularly I worry about AIs that predict future reward directly, and then just take the local action that predicts the highest future reward. Like is typically done in reinforcement learning. An example would be Deepmind's Atari playing AI which got a lot of press.

I don't think AIs with entire world models that use general planning algorithms would scale to real world problems.Too much irrelevant information to model, too large a search space to search.

As they train their internal model to predict what their reward will be in x time steps, and as x goes to infinity, they care more and more about self preservation. Even if they have already hijacked the reward signal completely.